shamim-hussain / tgt

Triplet Graph Transformer
MIT License
26 stars 2 forks source link

ModuleNotFoundError: No module named 'lib.training_schemes.' #2

Closed ZGLCHN closed 5 months ago

ZGLCHN commented 6 months ago
image
shamim-hussain commented 6 months ago

Can you please share the command you are running and the full traceback?

ZGLCHN commented 6 months ago

Can you please share the command you are running and the full traceback?

ode/tgt-master/make_predictions.py" Traceback (most recent call last): File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/m File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/make_predictions.py", line 6, in execute('predict', config) File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/l File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/lib/training/execute.py", line 138, in execute scheme_class = import_scheme(config.get(KEY_SCHEME, SCHEME_DEFAULT)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/l File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/lib/training/execute.py", line 57, in import_scheme imported_module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/anaconda3/envs/my-rdkit-env/lib/python3.12/importlib/init.py", line File "/Applications/anaconda3/envs/my-rdkit-env/lib/python3.12/importlib/init.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 1387, in _gcd_import File "", line 1360, in _find_and_load File "", line 1324, in _find_and_load_unlocked ModuleNotFoundError: No module named 'lib.training_schemes.'

image

As shown in the figure, I clicked on the red box to execute Python code and encountered these errors.

shamim-hussain commented 6 months ago

You cannot simply run it as a script. You have to pass in command line arguments. For that, you should use the console/terminal of your OS (powershell on windows, bash on linux etc.)

  1. Download the data and model weights as described in the README.
  2. Make distance predictions (on the training and validation sets by default). Run the following command from the terminal
    python make_predictions.py configs/pcqm/tgt_at_200m/pcqm_dist_pred/tgt_at_100m_rdkit.yaml

    This will create a predictions directory (e.g. bins50) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10' (we used 50 samples, you can increase it during the final inference to get better results).

  3. Final evaluation. Run the following command from the terminal
    python do_evaluations.py configs/pcqm/tgt_at_200m/pcqm_gap_pred/tgt_at_100m_rdkit.yaml

    the results will be printed to the console and also saved in the predictions directory

ZGLCHN commented 6 months ago

You cannot simply run it as a script. You have to pass in command line arguments. For that, you should use the console/terminal of your OS (powershell on windows, bash on linux etc.)

  1. Download the data and model weights as described in the README.
  2. Make distance predictions (on the training and validation sets by default). Run the following command from the terminal

    python make_predictions.py configs/pcqm/tgt_at_200m/pcqm_dist_pred/tgt_at_100m_rdkit.yaml

    This will create a predictions directory (e.g. bins50) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10' (we used 50 samples, you can increase it during the final inference to get better results).

  3. Final evaluation. Run the following command from the terminal

    python do_evaluations.py configs/pcqm/tgt_at_200m/pcqm_gap_pred/tgt_at_100m_rdkit.yaml

    the results will be printed to the console and also saved in the predictions directory

Thank you for your previous reply. According to README, I have correctly placed the data in the folder data/PCQM.

But I don't know where the weight file should be placed?

Firstly, I obtained the weight file from Hugging Face.(models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_nordkit/checkpoint)

image

Secondly, in the tgt master project, I did not see pcqm_dist_pred/tgt-at100m_rdkit.yaml, so I have changed the code you provided to dist_preprd/tgt-at-dp.nordkit.yaml

image

As you can see, I placed model_state.pt files in the folder model and executed the command: python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_nordkit.yaml However, no new folder "predictions" was generated under the folder models

image

If I place model_state.pt files under the tgt master folder, the result will be the same.

image
shamim-hussain commented 6 months ago

You are right. There were errors in the instructions which have been corrected and the README has been updated. Please refer to this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb where you can find the full workflow for inference.

Alternatively, you may also follow these instructions -

  1. Download the data.
  2. Download the model weights. Maintain the same directory structure as provided in the huggingface repo, e.g., models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/checkpoint/model_state.pt. (Alternatively, you can use the huggingface-cli tool to download the model weights as described in this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb .)
  3. Make distance predictions on the validation set
    python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]'

    This will create a predictions directory (e.g. bins50) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10' (we used 50 samples, you can increase it during the final inference to get better results).

  4. Final evaluation (on the validation set):
    python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]'

    the results will be printed to the console and also saved in the predictions directory.

ZGLCHN commented 6 months ago

You are right. There were errors in the instructions which have been corrected and the README has been updated. Please refer to this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb where you can find the full workflow for inference.

Alternatively, you may also follow these instructions -

  1. Download the data.
  2. Download the model weights. Maintain the same directory structure as provided in the huggingface repo, e.g., models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/checkpoint/model_state.pt. (Alternatively, you can use the huggingface-cli tool to download the model weights as described in this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb .)
  3. Make distance predictions on the validation set

    python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]'

    This will create a predictions directory (e.g. bins50) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10' (we used 50 samples, you can increase it during the final inference to get better results).

  4. Final evaluation (on the validation set):

    python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]'

    the results will be printed to the console and also saved in the predictions directory.

I have downloaded the Model weights to the local directory correctly according to your tutorial, and the data files have also been placed correctly. However, when I followed the tutorial "Run Evaluation (Inference) Only on Validation Set", I did not receive the appropriate feedback. As shown in the following figure.

image

image

shamim-hussain commented 6 months ago

Can you please try running

python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'

And then

python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4' 'bins_input_path: models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/predictions/bins4'

What I've done here is

  1. Turned off distributed training, which is only useful when you have multiple GPUs.
  2. Turned off data loader workers. This spawns child processes by forking which sometimes causes problems on Windows.
  3. Set number of samples to 4. This reduces inference cost (time and memory). Although you won't get the best possible result (we use 50 samples), this is good for preliminary test.

You can also take a look at this Google Colab Notebook where I have tested this. https://colab.research.google.com/drive/1UJOXdt5VlP2QhiYpluGYnZukGFD0Wmoe?usp=sharing

ZGLCHN commented 6 months ago

Can you please try running

python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'

And then

python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4' 'bins_input_path: models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/predictions/bins4'

What I've done here is

  1. Turned off distributed training, which is only useful when you have multiple GPUs.
  2. Turned off data loader workers. This spawns child processes by forking which sometimes causes problems on Windows.
  3. Set number of samples to 4. This reduces inference cost (time and memory). Although you won't get the best possible result (we use 50 samples), this is good for preliminary test.

You can also take a look at this Google Colab Notebook where I have tested this. https://colab.research.google.com/drive/1UJOXdt5VlP2QhiYpluGYnZukGFD0Wmoe?usp=sharing

Thank you very much for your patient reply. I am now able to achieve the expected results by running the latest code you provided:

python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'

I attempted to run this code on both Windows 11 and Mac systems, and both were successful.

Running on Mac:

image

Running on Windows: e926720cee08e7e55403a2f2680685a6