williamleif / GraphSAGE

Representation learning on large graphs using stochastic graph convolutions.
Other
3.43k stars 844 forks source link

Fatal Python error: Segmentation fault #172

Open joelongLin opened 3 years ago

joelongLin commented 3 years ago

I wanna to train a dataset unsupervisedly with tensorflow 1.15(just want to train with multiple GPUs). About 42,000,000+ random walks edge pairs and 7,000,000 nodes. And ValueError: Tried to convert 'value' to a tensor and failed. Error: Cannot create a tensor proto whose content is larger than 2GB. happened. So I change the code like that

# define placeholders
adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.adj.shape, name="adj_info_ph")
test_adj_info_ph = tf.placeholder(tf.int32, shape=minibatch.test_adj.shape, name="test_adj_info_ph")

# define variable
adj_info = tf.Variable(adj_info_ph, trainable=False, name="adj_info")

# assign with placeholder instead of minibatch.adj or minibatch.test_adj
train_adj_info = tf.assign(adj_info, adj_info_ph)
val_adj_info = tf.assign(adj_info, test_adj_info_ph)

It works. that's the change of the origin colde.

But then, I encounter another problem, the logs of error is

Fatal Python error: Segmentation fault
Thread 0x00007f96d37fe700 (most recent call first):
  File \"/usr/lib64/python3.6/threading.py\", line 295 in wait
  File \"/usr/lib64/python3.6/queue.py\", line 164 in get
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py\", line 159 in run
  File \"/usr/lib64/python3.6/threading.py\", line 916 in _bootstrap_inner
  File \"/usr/lib64/python3.6/threading.py\", line 884 in _bootstrap
Thread 0x00007fa5e4e6e740 (most recent call first):
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1443 in _call_tf_sessionrun
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1350 in _run_fn
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1365 in _do_call
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1359 in _do_run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1180 in _run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 956 in run
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 302 in train
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 410 in main
  File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 250 in _run_main
  File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 299 in run
  File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/platform/app.py\", line 40 in run
  File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 415 in <module>

There are few related solutions online. Do you have any advice? Thanks a lot! My appeal is actually to use multi-GPUs parallel computing to accelerate the computation of GraphSAGE.