about sample in training dataset

jasonkyuyim / se3_diffusion

Implementation for SE(3) diffusion model with application to protein backbone generation

https://arxiv.org/abs/2302.02277

MIT License

305 stars 50 forks source link

about sample in training dataset #29

Closed WeianMao closed 5 months ago

WeianMao commented 1 year ago

As we all know, the data in PDB is continuously being updated. However, I noticed that in your data processing scripts, there are no operations to threshold the samples by time. May I ask how do you align the samples used in your paper? Because as time goes by, the samples I use will also increase.

jasonkyuyim commented 1 year ago

We don't do filtering by time because there is no notion of train/valid/test in unsupervised learning. The objective is to learn the data distribution. You're right PDB is constantly being updated so one should re-download from time to time.