libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.82k stars 178 forks source link

How to properly handle uint16 dtype? #151

Closed rpartsey closed 2 years ago

rpartsey commented 2 years ago

Hi dear maintainers and contributors,

My team and I (working on learned odometry estimation) found ffcv data loading speedup impressive and would like to integrate it into our training pipeline. In our original pipeline, we store depth images in np.uint16 dtype, and we can successfully convert our dataset into .beton format using the DatasetWriter below

writer = DatasetWriter(write_path, {
        ...
        'depth': NDArrayField(shape=(180, 320), dtype=np.dtype(np.uint16)),
        ...
    }, 
    num_workers=num_workers
)

However, we have trouble reading the depth from .beton at data loading time as torch doesn't support uint16. If we define Loader as

ffcv_loader = Loader(
    ...,
    pipelines={
        'depth': [NDArrayDecoder(), ToTensor()],
    }
)

we get

ffcv_loader = Loader(
  File "...ffcv/loader/loader.py", line 199, in __init__
    self.pipelines[field_name] = Pipeline(operations)
  File "...ffcv/pipeline/pipeline.py", line 25, in __init__
    self.operation_blocks, _ = self.parse_pipeline()
  File "...ffcv/pipeline/pipeline.py", line 42, in parse_pipeline
    current_state, memory_allocation = operation.declare_state_and_memory(
  File "...ffcv/transforms/ops.py", line 28, in declare_state_and_memory
    new_dtype = ch.from_numpy(np.empty((), dtype=previous_state.dtype)).dtype
TypeError: can't convert np.ndarray of type numpy.uint16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

We also tried defining Loader as

ffcv_loader = Loader(
    ...,
    pipelines={
        'depth': [NDArrayDecoder(), Convert(np.dtype(np.int32)), ToTensor()],
    }
)

we get

  File "...ffcv/loader/loader.py", line 214, in __iter__
    return EpochIterator(self, selected_order)
  File "...ffcv/loader/epoch_iterator.py", line 55, in __init__
    memory_allocations[p_id] = p.allocate_memory(self.loader.batch_size,
  File "...ffcv/pipeline/pipeline.py", line 103, in allocate_memory
    allocated_buffer = self.allocate_query(memory_allocation,
  File "...ffcv/pipeline/pipeline.py", line 82, in allocate_query
    ch_dtype = ch.from_numpy(np.empty(0, dtype=memory_allocation.dtype)).dtype
TypeError: can't convert np.ndarray of type numpy.uint16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

We would be grateful if you could answer the questions regarding the issue described above:

  1. what is the recommended way to handle uint16 dtype?
  2. we are curious why Convert transform doesn't return AllocationQuery

We appreciate any help you can provide. Looking forward to reading your paper.

GuillaumeLeclerc commented 2 years ago

Hi @rpartsey!

It seems the problem is coming from pytorch which doesn't support uint16 and I wasn't aware of that. I think the first thing to do is to make sure pytorch developers know that there is a need. Knowing that a fix wouldn't come anytime soon I can suggest:

And yes you are completely write about the missing AllocationQuery in Convert. There is a TODO here that wasn't converted into an issue. (@andrewilyas do you have a bit of time to help out with this? I'm still trying to get everything sorted for v1.0)

Thanks for your report!

rpartsey commented 2 years ago

Thank you for your response, @GuillaumeLeclerc!

My comment was that ffcv allows to write data in uint16 using NDArrayField successfully, but there is no way to read it back (at least in the current release). Maybe it's worth adding a notification or warning (I mean not to use uint16, better transform data to int16 because the range is the same).

Looks like that in torch community unsupported uint16 in the known issue and they are not going to change anything.

Yeah, we also thought about storing the data in int16.

GuillaumeLeclerc commented 2 years ago

I created #153 and will have it in v1. Feel free to reopen if you have more question