robinlingwood / BIMODAL

5 stars 30 forks source link

Fine-tuning #5

Open albertma-evotec opened 4 years ago

albertma-evotec commented 4 years ago

Hi,

I have fine-tuned the provided BIMODAL_fixed_512 model (model_fold_1_epochs_9.dat) with my own dataset (~2000 compounds). I have collected some statistics of the samples in each epoch. I found that the validity% was only ~60% at the first couple of epochs. Although it gradually increase to 80~90% in later epoch but it seems to me, it has "forgotten" what it has learnt from the pre-training at the beginning. It also happened on my another dataset (~1400 compounds).

Then I used a random subset of 2000 molecules from the provided CHEMBL set (SMILES_BIMODAL_FBRNN_fixed.csv) for fine-tuning but it still gave me only 57% validity at the first epoch. I expected to see high validity% because the 2000 molecules were just a subset of the CHEMBL dataset which was used to pre-train the model.

Does it look normal to you? Thanks Albert

robinlingwood commented 4 years ago

Hi Albert,

Yes, we made the same observation, that the validity decreases during the fine-tuning. But maybe by tuning e.g. the learning rate, one could reduce this effect.