Current FFCV version: v0.0.4 compiled from source.
I have two identical FFCV .beton files (ImageNet train dataset): One in an NVME ssd drive, and the other in an HDD drive.
Using the .beton file from SSD, I can load the full dataset in less than 6 minutes, at around 7 it/s (batch_size=512, data loading only - no model forward/backward pass). Using free -mh I have checked that the dataset is cached correctly in the RAM, and I have set os_cache=True in the Loader initialization.
From my understanding, a consecutive pass through the same .beton file should load files directly from the cache. However, when performing a data loading pass using the .beton file from my HDD, it seems as though the entire dataset is re-loaded onto the cache. From the HDD, data loading runs at around 1.2 it/s, and takes a total of ~30 minutes to complete, which is the same time it takes to load the .beton file after freeing the cache.
Does the difference in path (SSD: /nvme/imagenet.beton, HDD: /hdd/imagenet.beton) have an effect on the way the samples are stored in the cache, regardless of whether the two .beton files are identical or not?
Current FFCV version: v0.0.4 compiled from source.
I have two identical FFCV
.beton
files (ImageNet train dataset): One in an NVME ssd drive, and the other in an HDD drive.Using the
.beton
file from SSD, I can load the full dataset in less than 6 minutes, at around 7 it/s (batch_size=512
, data loading only - no model forward/backward pass). Usingfree -mh
I have checked that the dataset is cached correctly in the RAM, and I have setos_cache=True
in theLoader
initialization. From my understanding, a consecutive pass through the same.beton
file should load files directly from the cache. However, when performing a data loading pass using the.beton
file from my HDD, it seems as though the entire dataset is re-loaded onto the cache. From the HDD, data loading runs at around 1.2 it/s, and takes a total of ~30 minutes to complete, which is the same time it takes to load the.beton
file after freeing the cache.Does the difference in path (SSD:
/nvme/imagenet.beton
, HDD:/hdd/imagenet.beton
) have an effect on the way the samples are stored in the cache, regardless of whether the two.beton
files are identical or not?