Closed sagnak closed 1 year ago
Hi @sagnak , one can use StreamingDataset
for streaming the data from cloud as well as loading the data from local. For example, if your dataset resides locally, you can simply run it as
dataset = StreamingDataset(local='/mnt/data-obsd/mosaic/dataset') # This will read the data directly from the $local directory
dataset = StreamingDataset(remote='/mnt/data-obsd/mosaic/dataset', local='/tmp/dataset') # This will copy the data from $remote to $local and read the data from $local
Hi @sagnak, does my above solution work for you? I am closing this issue for now. Please feel free to re-open if you are still seeing the issue. Thanks!
Support online de-compressing of shards on LocalDataset as it is already done for StreamingDataset
When one creates a mosaic dataset to be streamed from the cloud, using
StreamingDataset
, it is possible to delegate the online de-compression of the shard to anmds
file to the library. This is however not supported forLocalDataset
and if the same dataset were to be used on a local filesystem, the library does not automatically decompress it. Below is the error I get when I use a compressed shard dataset