libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.84k stars 178 forks source link

Can FFCV work on the fly with a PyTorch Dataset? #276

Closed Vishu26 closed 4 months ago

Vishu26 commented 1 year ago

My general understanding of how FFCV works is that it serializes a dataset of a fixed size and shape into a .beton file. Now, in my workflow, I want to create FFCV data loaders on the fly to be able to sample data points of different size and dimensions. For example, I may want to sample an audio file at higher sampling rate. This cannot be achieved through transforms as it would need to access the source audio file. Is there a way to do such a thing?

andrewilyas commented 1 year ago

Sadly, I don't think this is something we'll be able to support anytime soon, the .beton file is central to FFCV's data loading strategy. I'd recommend either making separate datasets for different sampling rates, or making one high-rate dataset and downsampling from there. Sorry about that.