libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.79k stars 180 forks source link

Convert transform doesn't work with numpy array #328

Closed kimihailv closed 1 year ago

kimihailv commented 1 year ago

Hello. I have the following code:

pipelines = {
    'image': [RandomResizedCropRGBImageDecoder((224, 224)),
              RandomHorizontalFlip(),
              Convert(np.float32)]
}

loader = Loader('data.beton',
                batch_size=2048,
                num_workers=20,
                order=OrderOption.RANDOM,
                pipelines=pipelines)

But it doesn't work:


  File "[/usr/lib/python3.10/threading.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/usr/lib/python3.10/threading.py)", line 1016, in _bootstrap_inner
    self.run()
  File "[/usr/local/lib/python3.10/dist-packages/ffcv/loader/epoch_iterator.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ffcv/loader/epoch_iterator.py)", line 84, in run
    result = self.run_pipeline(b_ix, ixes, slot, events[slot])
  File "[/usr/local/lib/python3.10/dist-packages/ffcv/loader/epoch_iterator.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ffcv/loader/epoch_iterator.py)", line 146, in run_pipeline
    results = stage_code(**args)
  File "", line 2, in stage_code_0
  File "[/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py)", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "[/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/numba/core/dispatcher.py)", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'self': Cannot determine Numba type of <class 'ffcv.transforms.ops.Convert'>

File ".[./../../usr/local/lib/python3.10/dist-packages/ffcv/transforms/ops.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756231227d7d.vscode-resource.vscode-cdn.net/workspace/usr/local/lib/python3.10/dist-packages/ffcv/transforms/ops.py)", line 128:
        def convert(inp, dst):
            return inp.type(self.target_dtype)```
kimihailv commented 1 year ago

I tried np.dtype('float32') too

andrewilyas commented 1 year ago

As of now, Convert is implemented by calling x.type(dtype), which as far as I can tell is only available on torch tensors, and not numpy arrays. Try adding a ToTensor operation before the convert and then using torch.float32.

kimihailv commented 1 year ago

Thanks

kimihailv commented 1 year ago

Actually, I don't need torch tensors, I want to get numpy arrays. I tried to rewrite Convert transform. But this error encountered:

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'self': Cannot determine Numba type of <class '__main__.Convert'>

File ".[./../../tmp/ipykernel_76423/2351147030.py](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756232227d7d.vscode-resource.vscode-cdn.net/workspace/tmp/ipykernel_76423/2351147030.py)", line 21:
<source missing, REPL[/exec](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f6a6178222c2273657474696e6773223a7b22686f7374223a227373683a2f2f756232227d7d.vscode-resource.vscode-cdn.net/exec) in use?>

Code of transform:

class Convert(Operation):
    """Convert to target data type.

    Parameters
    ----------
    target_dtype: numpy.dtype or torch.dtype
        Target data type.
    """
    def __init__(self, target_dtype):
        super().__init__()
        self.target_dtype = target_dtype

    def generate_code(self) -> Callable:
        def convert(inp, dst):
            return inp.astype(self.target_dtype)

        convert.is_parallel = True

        return convert

    # TODO: something weird about device to allocate on
    def declare_state_and_memory(self, previous_state: State) -> Tuple[State, Optional[AllocationQuery]]:
        return replace(previous_state, dtype=self.target_dtype), None
andrewilyas commented 1 year ago

Hi! Try the following:

class Convert(Operation):
    """Convert to target data type.

    Parameters
    ----------
    target_dtype: numpy.dtype or torch.dtype
        Target data type.
    """
    def __init__(self, target_dtype):
        super().__init__()
        self.target_dtype = target_dtype

    def generate_code(self) -> Callable:
        target_dtype = self.target_dtype
        def convert(inp, dst):
            return inp.astype(target_dtype)

        convert.is_parallel = True

        return convert

    # TODO: something weird about device to allocate on
    def declare_state_and_memory(self, previous_state: State) -> Tuple[State, Optional[AllocationQuery]]:
        return replace(previous_state, dtype=self.target_dtype), None
kimihailv commented 1 year ago

it works, thank you!