Fix opposite os_cache behavior when creating Loader

elicassion commented 2 years ago

This small patch fixes the issue that os_cache behaves oppositely compared to declared in the document (https://docs.ffcv.io/parameter_tuning.html#scenario-large-scale-datasets). According to the document, when os_cache=True, FFCV will cache the whole dataset in RAM, thus ProcessCacheManager() should be used, which creates an np array in the memory. When os_cache=False, OSCacheManager() should be used, which uses np.memmap().

The current version is loading the whole dataset to RAM when os_cache=False.

GuillaumeLeclerc commented 2 years ago

Hello,

Sorry the documentation was confusing.

When os_cache is True then we use OSCacheManager, which delegates the management of the cache to the operating and will let it cache whatever is possible to do on your machine. It will depend on whether if you have enough RAM or not, if there are other users on the machine etc... But the idea here is that we let the OS do its thing. As a result it also enables sharing the dataset between multiple processes (Since it's handled by the OS itself).
When it is False FFCV takes over and will only keep what is strictly necessary and because FFCV has more knowledge about what will be needed in the future will start preloading ahead of time and optimize the loading. The amount of ram needed will depend on situation:
- If you are using SEQUENTIAL it only keeps a couple of samples in memory (the ones that are loaded ahead of time)
- If you are using RANDOM it will have to cache most of the dataset since to really sample perfectly at random you need to have the hole dataset
- If you are using QUASI_RANDOM it will be an in between both. It will keep enough to sample randomly, not as good randomness as RANDOM.

We would love a pull request on the documentation if you have a way to make it more clear to the readers.

elicassion commented 2 years ago

Hi Guillaume,

Thanks for the nice clarification! Order options will also affect RAM usage. I will try to improve the document.

libffcv / ffcv

Fix opposite os_cache behavior when creating Loader #74