libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.82k stars 178 forks source link

Image Range Issue!! #139

Closed ByungKwanLee closed 2 years ago

ByungKwanLee commented 2 years ago

First, thank you for significant contributions to computer vision task.

I expect the image range is the numbers between zero and one. But, actual parsed image has from zero to 255. For now, I understand its range issue's actual intention of FFCV code developers that 0 to 255 numbers have same information regardless of floating point 16 (half) and 32 (full). Here, I have a question about the existence of its information loss through forward and backward prop using cuda.amp if I normalize it for its number element between zero to one.

The following code lines are what I have.

gpu = 'cuda:0'
for name in ['train', 'test']:
    image_pipeline: List[Operation] = [SimpleRGBImageDecoder()]
    label_pipeline: List[Operation] = [IntDecoder(), ToTensor(), ToDevice(gpu), Squeeze()]
    if name == 'train':
        pass
        image_pipeline.extend([
            RandomHorizontalFlip(),
            RandomTranslate(padding=2),
        ])
    image_pipeline.extend([
        ToTensor(),
        ToDevice(gpu, non_blocking=True),
        ToTorchImage(),
        Convert(torch.float16),
    ])

    ordering = OrderOption.RANDOM if name == 'train' else OrderOption.SEQUENTIAL

    loaders[name] = Loader(paths[name], batch_size=train_batch_size if name == 'train' else test_batch_size,
                        num_workers=num_workers, order=ordering, drop_last=(name == 'train'),
                           pipelines={'image': image_pipeline, 'label': label_pipeline})
ByungKwanLee commented 2 years ago

While waiting for the answer to my question, I tried to add "NormalyzeImage" transform (mean=np.array([0,0,0]) std=np.array([255,255,255]), type=np.float16) on the assumption that normalizing images does not affect information loss. However, it did not work due to unknown factors (it may be version problem..?). And, I do not think I can solve the issue of NormalyzeImage alone. Then, I modified the code in ffcv.transform. Then, it works well. But, I have to double-check whether this code has problematic issues. Hence, I ask the developers to check it. Thank you!!

from typing import Callable, Optional, Tuple
from dataclasses import replace
from ffcv.pipeline.state import State
from ffcv.pipeline.allocation_query import AllocationQuery

class Normalize_and_Convert(Operation):
    def __init__(self, target_dtype, target_norm_bool):
        super().__init__()
        self.target_dtype = target_dtype
        self.target_norm_bool = target_norm_bool

    def generate_code(self) -> Callable:
        def convert(inp, dst):
            if self.target_norm_bool:
                inp = inp / 255.0
            return inp.type(self.target_dtype)

        convert.is_parallel = True

        return convert

    def declare_state_and_memory(self, previous_state: State) -> Tuple[State, Optional[AllocationQuery]]:
        return replace(previous_state, dtype=self.target_dtype), None
GuillaumeLeclerc commented 2 years ago

Hi @ByungKwanLee, The images themselves that make the dataset are stored in 0-255, so no information is lost there (you can't have more information than your source without a prior).

Neural networks work best on 0 centered floating point data this is why we have NormalizeImage in our examples. If you really need [0, 1] range what you are doing is correct. The only issue is that you are passing integers. If you pass floating point parameters it should work

NormalizeImage (mean=np.array([0.0,0,0]) std=np.array([255.0,255,255]))

(Note the 0.0 that forces the while array to be a float instead of an int)