Open JIESUN233 opened 12 months ago
Hi there. Thanks for your question. It's not immediately obvious to me why this isn't working, but it's possible it is because you are trying to put the edges in Host_Memory. Can you try the following for your storage config:
device_type: cuda
dataset_dir: products_example/
edges:
type: FLAT_FILE
nodes:
type: HOST_MEMORY
features:
type: PARTITION_BUFFER
options:
num_partitions: 32
buffer_capacity: 5
prefetching: true
fine_to_coarse_ratio: 1
num_cache_partitions: 0
node_partition_ordering: DISPERSED
I looked into this issue a bit more and it seems there were some bugs in the code that appeared very infrequently, but more often when running disk-based training. I have fixed those issues in this PR (#147) and merged the changes into main.
Can you try running your config again?
With the updates, I did not have any issues running the following.
Preprocessing command:
marius_preprocess --dataset ogbn_arxiv --output_dir datasets/ogbn_arxiv --num_partitions 32
Config:
model:
learning_task: NODE_CLASSIFICATION
encoder:
use_incoming_nbrs: true
use_outgoing_nbrs: true
train_neighbor_sampling:
- type: UNIFORM
options:
max_neighbors: 15
use_hashmap_sets: true
- type: UNIFORM
options:
max_neighbors: 10
- type: UNIFORM
options:
max_neighbors: 5
eval_neighbor_sampling:
- type: UNIFORM
options:
max_neighbors: 15
use_hashmap_sets: true
- type: UNIFORM
options:
max_neighbors: 10
- type: UNIFORM
options:
max_neighbors: 5
layers:
- - type: FEATURE
output_dim: 128
bias: false
activation: NONE
- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
init:
type: GLOROT_NORMAL
input_dim: 128
output_dim: 128
bias: true
bias_init:
type: ZEROS
activation: RELU
- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
init:
type: GLOROT_NORMAL
input_dim: 128
output_dim: 128
bias: true
bias_init:
type: ZEROS
activation: RELU
- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
init:
type: GLOROT_NORMAL
input_dim: 128
output_dim: 40
bias: true
bias_init:
type: ZEROS
activation: NONE
decoder:
type: NODE
loss:
type: CROSS_ENTROPY
options:
reduction: MEAN
dense_optimizer:
type: ADAM
options:
learning_rate: 0.003
storage:
device_type: cuda
dataset:
dataset_dir: datasets/ogbn_arxiv/
num_edges: 1166243
num_nodes: 169343
num_relations: 1
num_train: 90941
num_valid: 29799
num_test: 48603
feature_dim: 128
num_classes: 40
edges:
type: FLAT_FILE
nodes:
type: HOST_MEMORY
features:
type: PARTITION_BUFFER
options:
num_partitions: 32
buffer_capacity: 3
prefetching: true
fine_to_coarse_ratio: 1
num_cache_partitions: 0
node_partition_ordering: DISPERSED
prefetch: true
shuffle_input: true
full_graph_evaluation: true
train_edges_pre_sorted: false
training:
batch_size: 1000
num_epochs: 5
pipeline:
sync: true
epochs_per_shuffle: 1
logs_per_epoch: 10
evaluation:
batch_size: 1000
pipeline:
sync: true
epochs_per_eval: 1
Hi, I'd like to run the disk version of MariusGNN. I found when I set the feature's storage type to PARTITION_BUFFER, I would meet the segmentation fault error:
(If I set the storage type to HOST_MEMORY, I could run the training procedure successfully.)
Specifically, I downloaded the master branch of Marius, built a docker, and conducted experiments in the docker. I followed these instructions to install Marius.
Here is my training config file:
And this is how I generate the datasets:
This is the dataset directory:
I would be very appreciated if you could help solve my issue~