Closed elicassion closed 2 years ago
Hello,
Sorry the documentation was confusing.
os_cache
is True then we use OSCacheManager, which delegates the management of the cache to the operating and will let it cache whatever is possible to do on your machine. It will depend on whether if you have enough RAM or not, if there are other users on the machine etc... But the idea here is that we let the OS do its thing. As a result it also enables sharing the dataset between multiple processes (Since it's handled by the OS itself).SEQUENTIAL
it only keeps a couple of samples in memory (the ones that are loaded ahead of time)RANDOM
it will have to cache most of the dataset since to really sample perfectly at random you need to have the hole datasetQUASI_RANDOM
it will be an in between both. It will keep enough to sample randomly, not as good randomness as RANDOM
.We would love a pull request on the documentation if you have a way to make it more clear to the readers.
Hi Guillaume,
Thanks for the nice clarification! Order options will also affect RAM usage. I will try to improve the document.
This small patch fixes the issue that os_cache behaves oppositely compared to declared in the document (https://docs.ffcv.io/parameter_tuning.html#scenario-large-scale-datasets). According to the document, when os_cache=True, FFCV will cache the whole dataset in RAM, thus ProcessCacheManager() should be used, which creates an np array in the memory. When os_cache=False, OSCacheManager() should be used, which uses np.memmap().
The current version is loading the whole dataset to RAM when os_cache=False.