microsoft / foldingdiff

Diffusion models of protein structure; trigonometry and attention are all you need!
https://www.nature.com/articles/s41467-024-45051-2
MIT License
508 stars 57 forks source link

Minimum number of structures to train model #16

Open tanoramb opened 1 year ago

tanoramb commented 1 year ago

Hello,

I was performing some tests and it seems that there is a minimum number of protein structures to train a model. I have tested datasets with 2 through 10 structures (similar domains) and the pipeline runs starting at 10 structures.

Is it correct? or is there something I am not considering?

Thanks

wukevin commented 1 year ago

I don't think there's anything that would cause it to fail with fewer structures. The only thing that comes to mind is that we filter out structures with too few or too many amino acids; is it possible that your small datasets are also too small and get filtered out, leading to an empty dataset?