jiangyuang / ModelPruningLibrary

MIT License
22 stars 5 forks source link

Training time problem #2

Open lahmuller opened 3 years ago

lahmuller commented 3 years ago

It seems to me that training with the sparse model(model=model.to_sparse()) costs way more time than dense model. I have tried the code in 2020 and 2021, but the results are same: sparse costs more time. Does this has anything to do with my system(win10)? Hope the answer. Thank you.

jiangyuang commented 3 years ago

@lahmuller Thanks for the question. In our documentation (also see our Arxiv paper Section C.3), we stated

extension: the extension.cpp c++ file extends the current PyTorch implementation with sparse kernels (the installed module is called sparse_conv2d). However, please note that we only extend PyTorch's slow, cpu version of conv2d forward/backward with no groups and dilation = 1 (see PyTorch's c++ code here). In other words, we do not use acceleration packages such as MKL (which are not available on Raspberry Pis on which our paper experimented). Do not compare the speed of our implementation with the acceleration packages.

So please do not compare with the performance of the code that uses any acceleration package such as MKL. Note that PyTorch automatically use such acceleration packages unless configured specifically at installation. Also, the sparse model is faster only when its density is small, e.g. try < 10% density.

lahmuller commented 3 years ago

@lahmuller Thanks for the question. In our documentation (also see our Arxiv paper Section C.3), we stated

extension: the extension.cpp c++ file extends the current PyTorch implementation with sparse kernels (the installed module is called sparse_conv2d). However, please note that we only extend PyTorch's slow, cpu version of conv2d forward/backward with no groups and dilation = 1 (see PyTorch's c++ code here). In other words, we do not use acceleration packages such as MKL (which are not available on Raspberry Pis on which our paper experimented). Do not compare the speed of our implementation with the acceleration packages.

So please do not compare with the performance of the code that uses any acceleration package such as MKL. Note that PyTorch automatically use such acceleration packages unless configured specifically at installation. Also, the sparse model is faster only when its density is small, e.g. try < 10% density.

I thought pytorch will also accelerate sparse model as well as dense model since they are in the same env. Maybe I should create a new env without acceleration package to try this out. Thank you for your specification.

Karry-Xu commented 3 years ago

Hi,I tried to use your sparse matrix kernel to speed up, but it didn't work. You mentioned that Python will automatically use MKL to speed up. Would you like to make a comparative experiment to show the acceleration performance?

jiangyuang commented 3 years ago

@Karry-Xu We experimented the models mentioned in our paper on Raspberry Pi (version 4), where packaged such as MKL are automatically not used since Pi's use Arm-based CPUs.

In the table below, we show the time for one training iteration with 20 mini-batch size, both in dense form, and the corresponding time in sparse form where every layer only has one weight. The time in sparse form means that you can save at most 40% to 50% time (when your model is small enough using sparse form).

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

  | Dense form time | Sparse form time with each parameterized layer having one weight -- | -- | -- Conv2 | 2.2486449218005875 | 1.2492789765 Conv4 | 3.724286518478766 | 1.598415535595268 VGG-11 | 31.514276721399803 | 17.36990105989628 Resnet-18 | 25.3356386726 | 14.560097051459854
Karry-Xu commented 3 years ago

@jiangyuang Thanks for your reply