run slowly in real pytorch project

libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

https://ffcv.io

Apache License 2.0

2.81k stars 180 forks source link

run slowly in real pytorch project #179

Closed zhoupan9109 closed 2 years ago

zhoupan9109 commented 2 years ago

Following official documents and sample code, I generate the .beton file sucessfully. Efficiently loading makes me suprise, however applying the load code for real pytorch project, some unexpected results are captured in experiments.

The pytorch code of experiments as below:

experiment 1: loader = Loader('./ds.betion') for data, label in loader: if torch.cuda.is_availabel(): data = data.cuda()

experiment 2: loader = Loader('./ds.betion') for data, label in loader: if torch.cuda.is_availabel(): data = data.cuda() out = model(data)

the image pipeline: [SimpleRGBImageDecoder(). ToTensor(), ToTorchImage()]

In experiment2, the statement 'data = data.cuda()' run slowly 100x than the statement which at same place in experiment 1.

If you have some suggestions, please concat me without hesitation. Thank you very much.

GuillaumeLeclerc commented 2 years ago

Three things:

FFCV has an operation ToDevice that you should have in the pipeline instead of your for loop
I suspect you are not benchmarking the statement properly. With cuda code you can't just measure the time it was before the line and the time after a line because GPU code is asynchronous: please refer to this for proper benchmarking techniques: https://pytorch.org/tutorials/recipes/recipes/benchmark.html
I would recommend the time taken by complete loop instead of a single line. You should use tqdm like we do in our examples

Feel free to reopen the issue if this doesn't resolve your problem

zhoupan9109 commented 2 years ago

Thanks for your reply!

Befor the experiments above, the following code has been tested.

loader = Loader('./ds.betion') for data, label in loader: out = model(data)

image pipeline = [SimpleRGBImageDecoder(). ToTensor(), ToTorchImage(), ToDevice(0)]

On this condition, the statement, 'out = model(data)', costs more time than Pytorch officical Loader code.

For constrasting the efficiency of Loader, a complete Pytorch project has been tested which applys Loader from FFCV and Pytorch respectively . Because of SOTA efficiency of FFCV, intuitvely, Loader of FFCV will reduce the time of training phase. But the experiments show opposite results as follew:

FFCV: 19s / 10 iter ; Pytorch 15s / 10 iter

Above test baesd on the conditions:

hardware: GTX1080 image resolution: 256x144 batch size: 64 num_workers: 10

For solving this unexpected results, the above experiments have been designed for locating bug.