snap-stanford / relbench

RelBench: Relational Deep Learning Benchmark
https://relbench.stanford.edu
MIT License
223 stars 41 forks source link

Neighbor Sampler in train_model tutorial #254

Closed meyerjoe-R closed 3 months ago

meyerjoe-R commented 4 months ago

Discussed in https://github.com/snap-stanford/relbench/discussions/253

Originally posted by **meyerjoe-R** August 1, 2024 I'm running into an issue with the neighbor sampler in the train_model.ipynb tutorial. Any insight would be appreciated. I’m working with my own data I attempted to install packages in the order found in: https://github.com/snap-stanford/relbench/blob/main/tutorials/train_model.ipynb after getting similar errors as in: https://github.com/pyg-team/pytorch_geometric/discussions/9143 and https://github.com/pyg-team/pytorch_geometric/discussions/7866 `AttributeError: module 'torch_geometric' has no attribute 'typing' File , line 13 8 table_input = get_node_train_table_input( 9 table=table, 10 task=task, 11 ) 12 entity_table = table_input.nodes[0] ---> 13 loader_dict[split] = NeighborLoader( 14 graph, 15 num_neighbors=[ 16 128 for i in range(2) 17 ], # we sample subgraphs of depth 2, 128 neighbors per node. 18 # time_attr="CREATIONDATE", 19 input_nodes=table_input.nodes, 20 # input_time=table_input.time, 21 transform=table_input.transform, 22 batch_size=8, 23 # temporal_strategy="uniform", 24 shuffle=split == "train", 25 num_workers=0, 26 persistent_workers=False, 27 )` `NeighborLoader.__init__(self, data, num_neighbors, input_nodes, input_time, replace, subgraph_type, disjoint, temporal_strategy, time_attr, weight_attr, transform, transform_sampler_output, is_sorted, filter_per_worker, neighbor_sampler, directed, **kwargs) 224 raise ValueError("Received conflicting 'input_time' and " 225 "'time_attr' arguments: 'input_time' is set " 226 "while 'time_attr' is not set.") 228 if neighbor_sampler is None: --> 229 neighbor_sampler = NeighborSampler( 230 data, 231 num_neighbors=num_neighbors, 232 replace=replace, 233 subgraph_type=subgraph_type, 234 disjoint=disjoint, 235 temporal_strategy=temporal_strategy, 236 time_attr=time_attr, 237 weight_attr=weight_attr, 238 is_sorted=is_sorted, 239 share_memory=kwargs.get('num_workers', 0) > 0, 240 directed=directed, 241 ) 243 super().__init__( 244 data=data, 245 node_sampler=neighbor_sampler, (...) 251 **kwargs, 252 ) L/lib/python3.10/site-packages/torch_geometric/sampler/neighbor_sampler.py:59, in NeighborSampler.__init__(self, data, num_neighbors, subgraph_type, replace, disjoint, temporal_strategy, time_attr, weight_attr, is_sorted, share_memory, directed) 54 subgraph_type = SubgraphType.induced 55 warnings.warn(f"The usage of the 'directed' argument in " 56 f"'{self.__class__.__name__}' is deprecated. Use " 57 f"`subgraph_type='induced'` instead.") ---> 59 if (not torch_geometric.typing.WITH_PYG_LIB and sys.platform == 'linux' 60 and subgraph_type != SubgraphType.induced): 61 warnings.warn(f"Using '{self.__class__.__name__}' without a " 62 f"'pyg-lib' installation is deprecated and will be " 63 f"removed soon. Please install 'pyg-lib' for " 64 f"accelerated neighborhood sampling") 66 self.data_type = DataType.from_data(data)` Additionally, importing get_node_train_table_input and make_pkey_fkey_graph seemed to have errors too: `[autoreload of torch_frame.datasets.titanic failed: Traceback (most recent call last): File "/databricks/python/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 276, in check superreload(m, reload, self.old_objects) File "/databricks/python/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 475, in superreload module = reload(module) File "/usr/lib/python3.10/importlib/__init__.py", line 169, in reload _bootstrap._exec(spec, module) File "", line 619, in _exec File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-379e0c14-fbb0-47b8-ba59-00a10c075a29/lib/python3.10/site-packages/torch_frame/datasets/titanic.py", line 6, in class Titanic(torch_frame.data.Dataset): AttributeError: module 'torch_frame' has no attribute 'data' ]`
meyerjoe-R commented 3 months ago

this order seemed to work

pip install torch==2.3.0+cpu -f https://download.pytorch.org/whl/torch_stable.html pip install torch_scatter torch_sparse torch_cluster torch_spline_conv pyg_lib -f https://data.pyg.org/whl/torch-2.3.0+cpu.html pip install git+https://github.com/pyg-team/pytorch_geometric.git pip install pytorch_frame[full] relbench[full] sentence-transformers