Open HaotianZhangAI4Science opened 1 year ago
Yes, that sounds right! Note that this compute cost was specifically for my particular dataset of ~1M molecules. Depending on the types of molecules you're interested in (# of atoms, # of rotatable bonds, etc), you could potentially need much fewer molecules to train the network -- the cost of this data generation would scale down linearly in that case.
It's also completely possible that I did not need all 1M molecules to train my published model; I did not analyze the model's performance degradation with decreasing dataset size.
Thanks for your kind response, I would try retraining SQUID on a smaller dataset.
Nice work! I want to retrain this on my own new datasets. I found in the dataset_generation you mentioned Each script in Step 6 takes approximately 3 days to complete. Do you mean you run the step 6, which contains 15 iterations, for about 45 days with 24 CPU cores?