jzhoulab / ddsm

Dirichlet Diffusion Score Model for Biological Sequence Generation.
Other
45 stars 7 forks source link

The performance of DDSM for unconditional DNA generation #2

Open Zehui127 opened 10 months ago

Zehui127 commented 10 months ago

Dear Team,

I have been working on developing the generative model for DNA sequences. For a fair comparison, I compare different algorithms in the unconditional generation case. It seems that DDSM fails to capture the motif distribution in the unconditional DNA sequence generation case. By unconditional generation, I mean the transcription profile is not supported as conditions.

I wonder if you have tried to use DDSM for unconditional DNA sequence generation and what is the expected result.

PS: I tried both time dilation and without dilation, and the generated samples don't seem to be capturing the motif distribution of input sequences. The training script is available.

Best, Zehui

PavelAvdeyev commented 10 months ago

Dear Zehui,

If you are using training hyperparameters provided by the paper, you will get sub-optimal results on unconditional generation since our training setups are not optimized for unconditional generation. Therefore, it would be hard to compare with DDSM rigorously on unconditional generation. If you have to perform a comparison on unconditional generation, one option could be fine-tuning the conditional generation model provided by us for unconditional generation. You can also train DDSM for unconditional generation from scratch but it will probably require some hyperparameter tweaks.

Zehui127 commented 10 months ago

Dear @PavelAvdeyev ,

Thanks for your response on this. I indeed train from stretch for unconditional generation. But it seems to be having issue in terms of the quality of the generation sequences. I notice one of the potential issue is that the scoreNet used in the current code is relatively small, we will do some incremental change on the score net and see if it will work.

Best, Zehui Li