keiradams / SQUID

Official implementation of "Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design"
MIT License
53 stars 6 forks source link

Issues with data construction #4

Open HaotianZhangAI4Science opened 1 year ago

HaotianZhangAI4Science commented 1 year ago

Nice work! I want to retrain this on my own new datasets. I found in the dataset_generation you mentioned Each script in Step 6 takes approximately 3 days to complete. Do you mean you run the step 6, which contains 15 iterations, for about 45 days with 24 CPU cores?

keiradams commented 1 year ago

Yes, that sounds right! Note that this compute cost was specifically for my particular dataset of ~1M molecules. Depending on the types of molecules you're interested in (# of atoms, # of rotatable bonds, etc), you could potentially need much fewer molecules to train the network -- the cost of this data generation would scale down linearly in that case.

It's also completely possible that I did not need all 1M molecules to train my published model; I did not analyze the model's performance degradation with decreasing dataset size.

HaotianZhangAI4Science commented 1 year ago

Thanks for your kind response, I would try retraining SQUID on a smaller dataset.