microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2.08k stars 334 forks source link

Example script hangs without output #111

Open ayushnoori opened 2 years ago

ayushnoori commented 2 years ago

Hi Graphormer Team - thanks for this excellent codebase! I am opening a new issue because two similar issues have been previously closed. They are https://github.com/microsoft/Graphormer/issues/100 by @skye95git and https://github.com/microsoft/Graphormer/issues/99 by @mswzeus.

Bug

In short, despite carefully following the Graphormer installation instructions, there is no response at the console after running bash zinc.sh and interrupting the script produces a File "<frozen importlib._bootstrap>", line 107, in acquire error message.

Failed Solutions

In https://github.com/microsoft/Graphormer/issues/99, it was suggested that pip uninstall setuptools would fix the issue; however, this was not the case. Rather, running bash zinc.sh now throws terminate called after throwing an instance of 'std::bad_alloc' (which is fixed by reinstalling setuptools).

From @zhengsx in https://github.com/microsoft/Graphormer/issues/100:

We notice that similar problems occur recently, and a guess is that some recent upgrades of library lead to this problem. We're working on this problem, and maybe you can try to modify the version of cuda/torch, cu110 is recommeded.

Unfortunately, downgrading to torch.version.cuda==10.2 did not fix the issue either (i.e., bash zinc.sh still hangs). Should I find a solution, I will follow-up on this thread for future reference.

skye95git commented 2 years ago

The problem should be caused by a mismatch between cudA, PyTorch, and PyTorle-Geometric versions. Install PyTorch - Geometric according to the tutorial on the official website https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html. I have installed CUDa and Pytorle-Geometric several times with other tutorials and encountered no response or errors. Only refer to the official website installation this time successfully. My installation steps are as follows:

conda create -n graphormer python=3.9
conda activate graphormer

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-geometric

pip install tensorboardX
pip install ogb
pip install rdkit-pypi

pip install dgl-cu113 dglgo -f https://data.dgl.ai/wheels/repo.html
pip install lmdb

Before run bash zinc.sh:

cd fairseq
pip install . --use-feature=in-tree-build
python setup.py build_ext --inplace
pip uninstall setuptools
zhengsx commented 2 years ago

The problem should be caused by a mismatch between cudA, PyTorch, and PyTorle-Geometric versions. Install PyTorch - Geometric according to the tutorial on the official website https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html. I have installed CUDa and Pytorle-Geometric several times with other tutorials and encountered no response or errors. Only refer to the official website installation this time successfully. My installation steps are as follows:

conda create -n graphormer python=3.9
conda activate graphormer

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-geometric

pip install tensorboardX
pip install ogb
pip install rdkit-pypi

pip install dgl-cu113 dglgo -f https://data.dgl.ai/wheels/repo.html
pip install lmdb

Before run bash zinc.sh:

cd fairseq
pip install . --use-feature=in-tree-build
python setup.py build_ext --inplace
pip uninstall setuptools

Hi @skye95git , thanks so much for providing the information. We will look into this by following your hint.

ayushnoori commented 2 years ago

Hi @skye95git @zhengsx, I tried exactly these steps (up to pip uninstall setuptools), however, encountered the following error:

(graphormer) [an252@compute-g-16-254 property_prediction]$ bash zinc.sh
Error processing line 1 of /home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/distutils-precedence.pth:

  Traceback (most recent call last):
    File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site.py", line 169, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
  ModuleNotFoundError: No module named '_distutils_hack'
The full error message is below. ``` Remainder of file ignored Traceback (most recent call last): File "/home/an252/.conda/envs/graphormer_3/bin/fairseq-train", line 5, in from fairseq_cli.train import cli_main File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq_cli/train.py", line 30, in from fairseq import checkpoint_utils, options, quantization_utils, tasks, utils File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/__init__.py", line 33, in import fairseq.criterions # noqa File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/criterions/__init__.py", line 36, in importlib.import_module("fairseq.criterions." + file_name) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/criterions/ctc.py", line 19, in from fairseq.tasks import FairseqTask File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/tasks/__init__.py", line 136, in import_tasks(tasks_dir, "fairseq.tasks") File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/tasks/__init__.py", line 117, in import_tasks importlib.import_module(namespace + "." + task_name) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/tasks/multilingual_translation.py", line 20, in from fairseq.models import FairseqMultiModel File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/models/__init__.py", line 234, in import_models(models_dir, "fairseq.models") File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/models/__init__.py", line 216, in import_models importlib.import_module(namespace + "." + model_name) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/fairseq/models/model_utils.py", line 13, in def script_skip_tensor_list(x: List[Tensor], mask): File "/home/an252/.conda/envs/graphormer_3/lib/python3.9/site-packages/torch/jit/_script.py", line 1318, in script fn = torch._C._jit_script_compile( MemoryError: std::bad_alloc ```

I also discovered that the GCC >= 5.0 is required, otherwise running python setup.py build_ext --inplace returns an error: Your compiler (g++ 4.8.5) may be ABI-incompatible with PyTorch! and gcc: error: unrecognized command line option ‘-std=c++14’. Thus, I ran module load gcc/9.2.0 to fix this issue (see here).

I would be grateful for your advice on how to resolve.

skye95git commented 2 years ago
  1. After pip uninstall setuptools, I do get error No module named '_distutils_hack'. But the code still runs.
  2. Maybe the version I installed is not suitable for you. You should consider the version of your GCC and the version of your driver. My version information is as follows:
    GCC 10.3.0
    Driver Version: 450.80.02