libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.86k stars 180 forks source link

Support `TorchTensorField` #101

Closed carmocca closed 2 years ago

carmocca commented 2 years ago

There should be a counterpart to NDArrayField for torch Tensors. This would be useful to convert existing torch.utils.data.Datasets to .beton.

Following the API described in this guide: https://docs.ffcv.io/writing_datasets.html#writing-a-dataset-to-ffcv-format

dataset = Dataset(...)
fields = {"covariate": TorchTensorField(shape=..., dtype=torch.float32)}
writer = DatasetWriter(dataset_path, fields)
writer.from_indexed_dataset(dataset)

Using NDArrayField currently fails with:

  File "/home/carlos/miniconda3/envs/ffcv/lib/python3.8/site-packages/ffcv/fields/ndarray.py", line 94, in encode
    data_region[:] = field.reshape(-1).view('<u1')
TypeError: view() received an invalid combination of arguments - got (str), but expected one of:
 * (tuple of ints size)
      didn't match because some of the arguments have invalid types: (str)
 * (torch.dtype dtype)
      didn't match because some of the arguments have invalid types: (str)
GuillaumeLeclerc commented 2 years ago

I think it would just be easier to just check the type and convert tensors as part of the encode procedure of the NDarray field. Thoughts ?

I could also add what you suggest and make it extend NDarray field.

Any preference ? (Both are super easy and I'll add that in the next release for sure)

carmocca commented 2 years ago

I think the added class (even if it's a minimal extension) is the simplest option and most natural to users

GuillaumeLeclerc commented 2 years ago

Trying to balance ICML submissions and adding features to FFCV but I'll try to have this today.

GuillaumeLeclerc commented 2 years ago

Sorry for the ICML delay. This should land in v0.0.4 (possibly release candidate tonight)