Confirmation: Pretraining is Non-Deterministic?

mlfoundations / open_clip

An open source implementation of CLIP.

Other

9.93k stars 959 forks source link

Confirmation: Pretraining is Non-Deterministic? #776

Closed RylanSchaeffer closed 8 months ago

RylanSchaeffer commented 9 months ago

We're pretraining small CLIP models and finding that the exact same pretraining runs differ from one another slightly. We see that the code is seeded, so we were initially surprised, but then we found this issue:

https://github.com/mlfoundations/open_clip/issues/734

To confirm, is the pretraining code non-deterministic? If so, is there a way to make pretraining deterministic?

mitchellnw commented 8 months ago

Yes non-determinism is expected right now, sorry.

RylanSchaeffer commented 8 months ago

Thank you for confirming!

rwightman commented 8 months ago

@RylanSchaeffer you can make the dataset deterministic by trying what was metnioned in #734 ... complete training determinism in PyTorch on a GPU is a whole nother matter entirely and you can lookup what that entails, it tends to be non-trivial and a performance tradeoff.

RylanSchaeffer commented 8 months ago

I actually don't need determinism. I just wanted to confirm that different runs were producing slightly different results because training was non-deterministic. If that wasn't the case, I would have needed to investigate what I was doing wrong!