I wanna to train a dataset unsupervisedly with tensorflow 1.15(just want to train with multiple GPUs). About 42,000,000+ random walks edge pairs and 7,000,000 nodes.
And ValueError: Tried to convert 'value' to a tensor and failed. Error: Cannot create a tensor proto whose content is larger than 2GB. happened. So I change the code like that
But then, I encounter another problem, the logs of error is
Fatal Python error: Segmentation fault
Thread 0x00007f96d37fe700 (most recent call first):
File \"/usr/lib64/python3.6/threading.py\", line 295 in wait
File \"/usr/lib64/python3.6/queue.py\", line 164 in get
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py\", line 159 in run
File \"/usr/lib64/python3.6/threading.py\", line 916 in _bootstrap_inner
File \"/usr/lib64/python3.6/threading.py\", line 884 in _bootstrap
Thread 0x00007fa5e4e6e740 (most recent call first):
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1443 in _call_tf_sessionrun
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1350 in _run_fn
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1365 in _do_call
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1359 in _do_run
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 1180 in _run
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py\", line 956 in run
File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 302 in train
File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 410 in main
File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 250 in _run_main
File \"/usr/local/lib/python3.6/site-packages/absl/app.py\", line 299 in run
File \"/usr/local/lib64/python3.6/site-packages/tensorflow_core/python/platform/app.py\", line 40 in run
File \"${MY_LOCAL_PATH}/GraphSAGE/graphsage/unsupervised_train_tf1.15.py\", line 415 in <module>
There are few related solutions online. Do you have any advice? Thanks a lot!
My appeal is actually to use multi-GPUs parallel computing to accelerate the computation of GraphSAGE.
I wanna to train a dataset unsupervisedly with tensorflow 1.15(just want to train with multiple GPUs). About 42,000,000+ random walks edge pairs and 7,000,000 nodes. And
ValueError: Tried to convert 'value' to a tensor and failed. Error: Cannot create a tensor proto whose content is larger than 2GB.
happened. So I change the code like thatIt works. that's the change of the origin colde.
But then, I encounter another problem, the logs of error is
There are few related solutions online. Do you have any advice? Thanks a lot! My appeal is actually to use multi-GPUs parallel computing to accelerate the computation of GraphSAGE.