Open juliustao opened 2 years ago
Hi! I'm not sure doing this part of the transforms is the optimal way as every transform with randomness will have to add it, and users have to set the seed many times. Moreover it will be ran at each batch which will slow down things. Maybe there is a better place. Any idea how long does the random state in numba lives ? Is it per thread or per process ?
Thanks for the quick response Guillaume!
I agree that this is not a great way to seed the random state in numba
. I wasn't sure how to modify the Operation
parent class so that np.random.seed(seed)
could be called in an arbitrary function returned by generate_code
.
A nicer solution would be to seed the numba
random state once for all future JIT-compiled functions.
I'm not too familiar with numba
, but the documentation linked above says that
Since version 0.28.0, the generator is thread-safe and fork-safe. Each thread and each process will produce independent streams of random numbers.
This numba thread suggests that it's possible to set the numba
random state once at the start for determinism in single-threaded code. I'll dig into this more and run some tests once the slurm
cluster is back up.
Hope that clarified some questions :)
I think the best would be to do it at the start of loading and use the seed argument. Doing it in the operations mean you get the same random sequence at each batch which most likely will produce adverse results during training.
I was able to get determinism after using training.num_workers = 1
with
torch.backends.cudnn.deterministic = True
torch.manual_seed(SEED)
np.random.seed(SEED)
in train_cifar.py
and setting the numba
seed inside the EpochIterator
thread.
I cannot set the numba
seed in train_cifar.py
like numpy
or torch
since every thread has an independent numba
state.
This solution is still suboptimal, and I hope there's a simple fix that I overlooked.
Also, I'm confused about the threading in ffcv
: why is the EpochIterator
object returned by iter(Loader())
implemented as a Thread
? Is it since waiting for the cuda
stream takes significant time where we can perform other cpu
operations?
I suspect that it would work with multiple workers too as workers are only activate in the body of the transforms. Have you tried that?
EpochIterator is implemented as a thread so that the augmentations (especially the CPU ones) are not blocking the main training loop of the user
With the code above, setting training.num_workers > 1
does not give deterministic results :(
I haven't confidently figured out why that's the case, but I suspect that the cause is numba
threads interleaving randomly.
Oh that's the good call in my opinion. Not sure yet how to get around that problem. Do you personally have use cases where determinism is needed. Usually determinism doesn't play well with high-performance code (cuddn deterministic mode can be significantly slower too)
My current work is looking at how fixing different sources of randomness affects training outcomes, and data augmentations are an example. Maybe this use case is rather niche, and the changes are not worth the hit in performance. Hopefully this thread at least can help others with similar issues :)
On a related note, is the desired default behavior of the Random
TraversalOrder
to have the same shuffle order across independent runs? The default is self.seed = self.loader.seed = 0
, which implies the above since the seed for each epoch is always self.seed + epoch
.
Hello,
It was intended but we decided to change this in v0.0.3. the release candidate is available on pip already. See announcements for more details .
On Mon, Jan 24, 2022, 8:15 AM Julius Tao @.***> wrote:
On a related note, is the desired default behavior of the Random TraversalOrder to have the same shuffle order across independent runs? The default is self.seed = self.loader.seed = 0, which implies the above since the seed for each epoch is always self.seed + epoch.
— Reply to this email directly, view it on GitHub https://github.com/libffcv/ffcv/pull/89#issuecomment-1020272178, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPMOG6467F5I6ZOXAWXIBTUXV3KZANCNFSM5MSVZ37A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Did you succeed to run ffcv deterministic @juliustao? I am facing similar problem with a gap of more than 5 points in my metric with same code on different runs. I seed everything, but I was looking for how to seed the workers as in pytorch dataloaders, but could not find it.
Calling
numpy.random.seed
at the start of code can set the global seed for allnumpy.random
methods. However, calling this method from Python does not seed thenumba
generator, sonumba
JIT-compiled code is nondeterministic (e.g., thecutout_square()
function returned bygenerate_code()
in theCutout
operation). See thisnumba
documentation for further details. Adding an optionalseed
argument to such random transforms allows reproducibility across runs.