Closed ZGLCHN closed 5 months ago
Can you please share the command you are running and the full traceback?
Can you please share the command you are running and the full traceback?
ode/tgt-master/make_predictions.py"
Traceback (most recent call last):
File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/m
File "/Users/zhengguolin/Library/CloudStorage/OneDrive-个人/Project/Paper Code/tgt-master/make_predictions.py", line 6, in
As shown in the figure, I clicked on the red box to execute Python code and encountered these errors.
You cannot simply run it as a script. You have to pass in command line arguments. For that, you should use the console/terminal of your OS (powershell on windows, bash on linux etc.)
python make_predictions.py configs/pcqm/tgt_at_200m/pcqm_dist_pred/tgt_at_100m_rdkit.yaml
This will create a predictions
directory (e.g. bins50
) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10'
(we used 50 samples, you can increase it during the final inference to get better results).
python do_evaluations.py configs/pcqm/tgt_at_200m/pcqm_gap_pred/tgt_at_100m_rdkit.yaml
the results will be printed to the console and also saved in the predictions directory
You cannot simply run it as a script. You have to pass in command line arguments. For that, you should use the console/terminal of your OS (powershell on windows, bash on linux etc.)
- Download the data and model weights as described in the README.
Make distance predictions (on the training and validation sets by default). Run the following command from the terminal
python make_predictions.py configs/pcqm/tgt_at_200m/pcqm_dist_pred/tgt_at_100m_rdkit.yaml
This will create a
predictions
directory (e.g.bins50
) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument'prediction_samples: 10'
(we used 50 samples, you can increase it during the final inference to get better results).Final evaluation. Run the following command from the terminal
python do_evaluations.py configs/pcqm/tgt_at_200m/pcqm_gap_pred/tgt_at_100m_rdkit.yaml
the results will be printed to the console and also saved in the predictions directory
Thank you for your previous reply. According to README, I have correctly placed the data in the folder data/PCQM.
Firstly, I obtained the weight file from Hugging Face.(models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_nordkit/checkpoint)
Secondly, in the tgt master project, I did not see pcqm_dist_pred/tgt-at100m_rdkit.yaml
, so I have changed the code you provided to dist_preprd/tgt-at-dp.nordkit.yaml
As you can see, I placed model_state.pt files in the folder model and executed the command:
python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_nordkit.yaml
However, no new folder "predictions" was generated under the folder models
If I place model_state.pt files under the tgt master folder, the result will be the same.
You are right. There were errors in the instructions which have been corrected and the README has been updated. Please refer to this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb where you can find the full workflow for inference.
Alternatively, you may also follow these instructions -
models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/checkpoint/model_state.pt
. (Alternatively, you can use the huggingface-cli
tool to download the model weights as described in this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb .)python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]'
This will create a predictions
directory (e.g. bins50
) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10'
(we used 50 samples, you can increase it during the final inference to get better results).
python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]'
the results will be printed to the console and also saved in the predictions directory.
You are right. There were errors in the instructions which have been corrected and the README has been updated. Please refer to this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb where you can find the full workflow for inference.
Alternatively, you may also follow these instructions -
- Download the data.
- Download the model weights. Maintain the same directory structure as provided in the huggingface repo, e.g.,
models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/checkpoint/model_state.pt
. (Alternatively, you can use thehuggingface-cli
tool to download the model weights as described in this notebook https://github.com/shamim-hussain/tgt/blob/master/inference_example.ipynb .)Make distance predictions on the validation set
python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]'
This will create a
predictions
directory (e.g.bins50
) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument'prediction_samples: 10'
(we used 50 samples, you can increase it during the final inference to get better results).Final evaluation (on the validation set):
python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]'
the results will be printed to the console and also saved in the predictions directory.
I have downloaded the Model weights to the local directory correctly according to your tutorial, and the data files have also been placed correctly. However, when I followed the tutorial "Run Evaluation (Inference) Only on Validation Set", I did not receive the appropriate feedback. As shown in the following figure.
Can you please try running
python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'
And then
python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4' 'bins_input_path: models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/predictions/bins4'
What I've done here is
fork
ing which sometimes causes problems on Windows.You can also take a look at this Google Colab Notebook where I have tested this. https://colab.research.google.com/drive/1UJOXdt5VlP2QhiYpluGYnZukGFD0Wmoe?usp=sharing
Can you please try running
python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'
And then
python do_evaluations.py configs/pcqm/tgt_at_200m/gap_pred/tgt_at_tp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4' 'bins_input_path: models/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit/predictions/bins4'
What I've done here is
- Turned off distributed training, which is only useful when you have multiple GPUs.
- Turned off data loader workers. This spawns child processes by
fork
ing which sometimes causes problems on Windows.- Set number of samples to 4. This reduces inference cost (time and memory). Although you won't get the best possible result (we use 50 samples), this is good for preliminary test.
You can also take a look at this Google Colab Notebook where I have tested this. https://colab.research.google.com/drive/1UJOXdt5VlP2QhiYpluGYnZukGFD0Wmoe?usp=sharing
Thank you very much for your patient reply. I am now able to achieve the expected results by running the latest code you provided:
python make_predictions.py configs/pcqm/tgt_at_200m/dist_pred/tgt_at_dp_rdkit.yaml 'predict_on: ["val"]' 'distributed: false' 'dataloader_workers: 0' 'prediction_samples: 4'
I attempted to run this code on both Windows 11 and Mac systems, and both were successful.
Running on Mac:
Running on Windows: