libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.8k stars 180 forks source link

FFCV vs Pytorch accuracy results #257

Closed huiwen99 closed 1 year ago

huiwen99 commented 1 year ago

Hi, thanks for this amazing library!

I have been trying to use FFCV to speed up my model training on custom data, and while it does train more quickly, I am concerned about the difference in accuracy results between using FFCV and Pytorch. The only difference is the dataloaders, but using Pytorch yields me about 80% val accuracy on the 2nd epoch, while FFCV gives me about 50-60% val accuracy (and there was not much increase across the first 5 epochs on which I trained for).

For Pytorch, I used the following transform in my dataset:

transform = A.Compose(
    [
        A.Resize(self.img_size[0],self.img_size[1]),
        A.Normalize(mean=mean, std=std),
        ToTensorV2()
    ]
)

Coupled with a simple dataloader

train_ld = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

Whereas for FFCV, the transform is only for resizing to a standard size so that I can use SimpleRGBImageDecoder later on:

transform = A.Compose(
    [
        A.Resize(self.img_size[0],self.img_size[1])
    ]
)

My DatasetWriter and Loader are as follows:

## DatasetWriter
writer = DatasetWriter(write_path, {
    # Tune options to optimize dataset size, throughput at train-time
    'image': RGBImageField(max_resolution=256),
    'label': IntField()
})
# Write dataset
writer.from_indexed_dataset(dataset)

## Loader
image_pipeline = [
    SimpleRGBImageDecoder(),
    ToTensor(),
    ToDevice(device),
    ToTorchImage(),
    Convert(torch.float32),
    T.Normalize(mean, std)
]
label_pipeline = [IntDecoder(), ToTensor(), ToDevice(device)]

pipelines = {
    'image': image_pipeline,
    'label': label_pipeline
}
if shuffle:
    order = OrderOption.RANDOM
else:
    order = OrderOption.SEQUENTIAL

loader = Loader(write_path, batch_size=batch_size, num_workers=num_workers,
            order=order, pipelines=pipelines, drop_last=False)

I noted that this issue also highlights a similar problem to mine, but I do not understand how the issue of inconsistent results was solved (I did not keep reference to my data/target and only use it within the iteration to keep track of loss and number of correct predictions).

What should I do to make the FFCV results similar to the Pytorch one?

Thanks!

PriceYH commented 1 year ago

The same strange issue also happened to me. When I perform the entire ImageNet training with the official suggested settings, I made the similar accuracy which the author provided. However, when I use the same procedure to perform training in my mini ImageNet (a totally of 1000 classes, for each class there are 200 images for training and 50 images for testing. All of them are randomly picked from 1K), the huge discrepancy between training and testing accuracy happened (nearly 20 points). Since I have made experiments on this dataset with Torch loader and the difference between training and testing accuracy is common ( within 2 points).

After that, I also made ablation experiments by using ffcv training loader and torch val loader, or using torch training loader and ffcv val loader. The accuracy discrepancy also exists.

kristian-georgiev commented 1 year ago

This sounds concerning. @huiwen99 @PriceYH, can you share a minimal example (including the data you used) that reproduces the behavior you observed? Thanks!

PriceYH commented 1 year ago

Dear:

It was my mistake, and now I had a wonderful experience using the FFCV to study my project.

Thanks for your excellent and generous work! ------------------ Original ------------------ From: @.>; Date:  Thu, Nov 10, 2022 04:36 AM To: @.>; Cc: @.>; @.>; Subject:  Re: [libffcv/ffcv] FFCV vs Pytorch accuracy results (Issue #257)

 

This sounds concerning. @huiwen99 @PriceYH, can you share a minimal example (including the data you used) that reproduces the behavior you observed? Thanks!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

andrewilyas commented 1 year ago

Closing this for now -- feel free to reopen if any new issues come up!