Closed 295825725 closed 7 months ago
We hypothesize that the transfer learning occurs in the form of the model's understanding of molecular geometry. So, in our molecular property prediction experiments (e.g., on MOLPCBA), we only transfer the pairwise distance predictor and train the task-specific predictor (e.g., solubility predictor) from scratch.
(We found that fine-tuning the gap predictor is only relevant when targeting other related quantum chemical properties (e.g., the tasks on QM9). On the other hand, the prediction of solubility is not directly related to the prediction of the HOMO-LUMO gap, so we should not expect a positive transfer of knowledge here. Previous models probably benefitted from pertaining to PCQM4Mv2 due to the models' indirect understanding of molecular geometry. In our case, the distance predictor directly learns to do that.)
The pairwise distances (predicted by our predictor) are fed as input features to the next stage (in your case, the solubility predictor). We draw multiple distance samples with dropout turned on from the distance predictor as a form of data augmentation.
In our experiments, we do not fine-tune the distance predictor but rather use it as a frozen feature extractor. These predicted distances result in better performance on downstream tasks than even RDKit-generated coordinates.
For more details, please refer to our paper arXiv-2402.04538
We will add implementation of fine-tuning on MOLPCBA soon, but for now, here is what you could look into -
Hi, I want to ask how should I finetune this model so that I can use it for other tasks? like instead of use for gap predictor, I want to predict the solubility, what is the training progress for it? Should I start with pretraining the predictor or I just need to do the finetune stage?