Error with customized dataset

aofchas commented 1 year ago

Hallo everone, I wanted to train a customized dataset. Following the instructions in https://graphormer.readthedocs.io/en/latest/Datasets.html#id5 I added my code before the create_customized_dataset function to build a dgl dataset class. Then I wrote a shell file to run the training process. But I got ModuleNotFoundError error when I started the training. Here was the error information

Traceback (most recent call last):
  File "/root/anaconda3/envs/graphormer/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/root/anaconda3/envs/graphormer/lib/python3.9/site-packages/fairseq_cli/train.py", line 528, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/root/anaconda3/envs/graphormer/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/root/anaconda3/envs/graphormer/lib/python3.9/site-packages/fairseq_cli/train.py", line 85, in main
    task = tasks.setup_task(cfg.task)
  File "/root/anaconda3/envs/graphormer/lib/python3.9/site-packages/fairseq/tasks/__init__.py", line 46, in setup_task
    return task.setup_task(cfg, **kwargs)
  File "/workspace/Graphormer/graphormer/tasks/graph_prediction.py", line 179, in setup_task
    return cls(cfg)
  File "/workspace/Graphormer/graphormer/tasks/graph_prediction.py", line 142, in __init__
    self.__import_user_defined_datasets(cfg.user_data_dir)
  File "/workspace/Graphormer/graphormer/tasks/graph_prediction.py", line 165, in __import_user_defined_datasets
    importlib.import_module(module_name)
  File "/root/anaconda3/envs/graphormer/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'customized_dataset'

Here was the shell file

#!/usr/bin/env bash

CUDA_VISIBLE_DEVICES=0 fairseq-train \
--user-dir ../../graphormer \
--user-data-dir /workspace/Graphormer/examples/customized_dataset \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name MonomerTg_dataset \
--task graph_prediction \
--criterion l1_loss \
--arch graphormer_slim \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.01 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates 60000 --total-num-update 400000 \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size 64 \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 80 \
--encoder-ffn-embed-dim 80 \
--encoder-attention-heads 8 \
--max-epoch 10000 \
--save-dir ./ckpts

Anyone knows why it happens and maybe a solution for it? Thank you very much!

Jevon-Du commented 1 year ago

@zhengsx

aofchas commented 1 year ago

A littele update from my side. I modified the following part of the code to load my own dataset directly. https://github.com/microsoft/Graphormer/blob/77f436db46fb9013121289db670d1a763f264153/graphormer/tasks/graph_prediction.py#L161-L165 My dataset can be successfully loaded and trained. But the loss seems to be constant. I do not know whether the problem is related to the code that I changed. I am still working on it.

microsoft / Graphormer

Error with customized dataset #147