Number of training steps for each dataset

louaaron / Score-Entropy-Discrete-Diffusion

[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)

https://aaronlou.com/blog/2024/discrete-diffusion/

MIT License

352 stars 33 forks source link

Number of training steps for each dataset #6

Closed alexlioralexli closed 5 months ago

alexlioralexli commented 5 months ago

I'm trying to reproduce the results in the paper, and it's not clear how many training iterations or epochs were done for each dataset. The default number of steps appears to be 1,300,001, but this is way too high. Could you clarify the right number for text8, 1BW, and OpenWebText?

louaaron commented 5 months ago

Hi Alex,

OpenWebText was trained on 400k iterations, text8 and 1bw were trained for one million following the D3PM paper.