marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Segmentation Fault in Disk Mode Despite Fix in PR 147 #157

Open griii opened 5 months ago

griii commented 5 months ago

I am reaching out to report an issue I am encountering with the marius_trian, specifically when using the disk mode. Despite the segmentation fault being addressed in this PR (https://github.com/marius-team/marius/pull/147), I am still experiencing the same problem after updating to the latest version of the project.

It is worth noting that the operation of Marius in memory mode is completely normal. Below are my configurations for Marius preprocessing. 图片

The following are my disk YAML configurations for marius_train.

# examples/configuration/ogbn_paper100m_disk.yaml
model:
  learning_task: NODE_CLASSIFICATION
  encoder:
    train_neighbor_sampling:
      - type: UNIFORM
        options:
          max_neighbors: 10
      - type: UNIFORM
        options:
          max_neighbors: 10
      - type: UNIFORM
        options:
          max_neighbors: 10
    layers:
      - - type: FEATURE
          output_dim: 100
      - - type: GNN
          options:
            type: GRAPH_SAGE
            aggregator: MEAN
          input_dim: 100
          output_dim: 256
          bias: true
      - - type: GNN
          options:
            type: GRAPH_SAGE
            aggregator: MEAN
          input_dim: 256
          output_dim: 256
          bias: true
      - - type: GNN
          options:
            type: GRAPH_SAGE
            aggregator: MEAN
          input_dim: 256
          output_dim: 150
          bias: true
  decoder:
    type: NODE
  loss:
    type: CROSS_ENTROPY
    options:
      reduction: SUM
  dense_optimizer:
    type: ADAM
    options:
      learning_rate: 0.01
storage:
  device_type: cuda
  dataset:
    dataset_dir: /data/wb2001/
  edges:
    type: FLAT_FILE
  nodes:
    type: HOST_MEMORY
  features:
    type: PARTITION_BUFFER
    options:
      num_partitions: 16
      buffer_capacity: 5
      prefetching: true
      fine_to_coarse_ratio: 1
      num_cache_partitions: 0
      node_partition_ordering: DISPERSED
  # prefetch: false
  # shuffle_input: true
  # full_graph_evaluation: true
training:
  batch_size: 1000
  num_epochs: 3
  pipeline:
    sync: true
zosong commented 1 month ago

I have the same problem with the same storage set-up. When I set the PARTITION_BUFFER, there will be a seg fault.