[Instructions of Inference] Could you provide complete instructions for reproducing the results on the OGB-LSC leaderboard?

gxglxy commented 8 months ago

Hi authors,

Thank you for open-sourcing this cool work! I want to try your models with the provided checkpoints. Could you further provide complete instructions for reproducing the results (e.g., 0.0671 eV/MAE on a valid set) on the OGB-LSC leaderboard with the available checkpoints?

Looking forward to your reply!

Best

shamim-hussain commented 8 months ago

Hi,

We will be adding a notebook soon explaining this and data preparation. But for now, here is a set of steps you can follow.

The preprocessed data is available at https://huggingface.co/datasets/shamim-hussain/pcqm. The parquet and npz files must be put in the data/PCQM directory. You can also download them by running the following command:

bash download_data.sh

The model weights are available at https://huggingface.co/shamim-hussain/tgt. You may directly copy the models directory from the huggingface repository. The raw weights are contained in the model_state.pt files in the checkpoint directories.
Make distance predictions (on the training and validation sets by default)
```
python make_predictions.py configs/pcqm/tgt_at_200m/pcqm_dist_pred/tgt_at_100m_rdkit.yaml
```
This will create a predictions directory (e.g. bins50) in the model directory, containing the predictions for the training and validation sets. To reduce the number of distance samples (and thus save time and disk space) add the following argument 'prediction_samples: 10' (we used 50 samples, you can increase it during the final inference to get better results).
Final evaluation:
```
python do_evaluations.py configs/pcqm/tgt_at_200m/pcqm_gap_pred/tgt_at_100m_rdkit.yaml
```
the results will be printed to the console and also saved in the predictions directory

shamim-hussain commented 7 months ago

We have now added instructions for data preparation and inference. Please refer to the README for details.

shamim-hussain / tgt

[Instructions of Inference] Could you provide complete instructions for reproducing the results on the OGB-LSC leaderboard? #1