swansonk14 / SyntheMol

Combinatorial antibiotic generation
MIT License
83 stars 19 forks source link

Question: how to use chemprop-RDKit models? #12

Closed Evert-Homan closed 3 months ago

Evert-Homan commented 5 months ago

Hi, I suppose these errors are related to flags which I set when training the chemprop models:

chemprop_predict --test_path "$(python -c 'import synthemol; print(str(synthemol.constants.BUILDING_BLOCKS_PATH))')" --preds_path models/chemprop/building_blocks.csv --checkpoint_dir ~/Data/CompChem/Python/chemprop_v1/Models/MTH1_pIC50/MTH1_pIC50_cv10 Loading training args Traceback (most recent call last): File "/home/evehom/miniforge3/envs/synthemol/bin/chemprop_predict", line 8, in <module> sys.exit(chemprop_predict()) File "/home/evehom/miniforge3/envs/synthemol/lib/python3.10/site-packages/chemprop/train/make_predictions.py", line 497, in chemprop_predict make_predictions(args=PredictArgs().parse_args()) File "/home/evehom/miniforge3/envs/synthemol/lib/python3.10/site-packages/chemprop/utils.py", line 540, in wrap result = func(*args, **kwargs) File "/home/evehom/miniforge3/envs/synthemol/lib/python3.10/site-packages/chemprop/train/make_predictions.py", line 384, in make_predictions ) = load_model(args, generator=True) File "/home/evehom/miniforge3/envs/synthemol/lib/python3.10/site-packages/chemprop/train/make_predictions.py", line 30, in load_model update_prediction_args(predict_args=args, train_args=train_args) File "/home/evehom/miniforge3/envs/synthemol/lib/python3.10/site-packages/chemprop/utils.py", line 741, in update_prediction_args raise ValueError( ValueError: If scaling of the additional features was done during training, the same must be done during prediction.

During chemprop training I set the following flags:

--features_generator rdkit_2d_normalized --no_features_scaling --aggregation sum --epochs 200 --split_type cv --num_folds 10

What flags should I invoke when running chemprop_predict on the building blocks when I use trained chemprop-RDKit models?

Thanks/Evert

swansonk14 commented 4 months ago

Hi @Evert-Homan,

Thanks for raising the issue and apologies for the delayed response! The error you're getting is a Chemprop error related to how features are provided to the model. Chemprop unfortunately has a somewhat strict requirement (to try to avoid user errors) that features must be provided in the same manner during training as they are during inference. This means that when running chemprop_train and chemprop_predict, you must either use --features_path both times (with pre-computed features) or you must use --features_generator both times. I tend to prefer using --features_path since pre-computing the features is a bit faster, but either method can work.

Please let me know if this works or if you're still having any problems!

Best, Kyle