Closed BedirYilmaz closed 4 years ago
Can it be your preprocessing of image? How many workers you use?
From: Bedir Yılmaz notifications@github.com Sent: Wednesday, March 4, 2020 4:43 PM To: leeyeehoo/CSRNet-pytorch CSRNet-pytorch@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [leeyeehoo/CSRNet-pytorch] What is your training time on ShanghaiTech Part A (#73)
Hi,
I have been suffering from serious slow-downs since two weeks now and I was not able to pinpoint the problem yet.
Here is how much time my machine (Cuda 10, Nvidia GTX 1080 Ti, PyTorch 1.4) requires during training currently
2020-02-10 09:48:29,344 - INFO - epoch 0, processed 0 samples, lr 0.0000010000, momentum 0.9500000000 2020-02-10 09:48:29,630 - INFO - Epoch: [0][0/960] Time 0.285 (0.285) Data 0.022 (0.022) Loss 25.5455 (25.5455) 2020-02-10 09:48:43,902 - INFO - Epoch: [0][30/960] Time 0.299 (0.470) Data 0.116 (0.091) Loss 246.4199 (297.7094) 2020-02-10 09:50:04,337 - INFO - Epoch: [0][180/960] Time 0.704 (0.525) Data 0.017 (0.058) Loss 15.9261 (269.7038) 2020-02-10 09:51:29,218 - INFO - Epoch: [0][330/960] Time 0.193 (0.543) Data 0.010 (0.047) Loss 16.2078 (351.5720) 2020-02-10 09:52:52,231 - INFO - Epoch: [0][480/960] Time 0.759 (0.547) Data 0.009 (0.040) Loss 2155.6362 (362.7242) 2020-02-10 09:54:14,097 - INFO - Epoch: [0][630/960] Time 0.276 (0.546) Data 0.014 (0.034) Loss 649.3963 (368.7552) 2020-02-10 09:55:39,716 - INFO - Epoch: [0][780/960] Time 0.382 (0.551) Data 0.013 (0.030) Loss 222.1103 (364.9167) 2020-02-10 09:57:03,813 - INFO - Epoch: [0][930/960] Time 0.735 (0.553) Data 0.016 (0.028) Loss 348.5327 (372.7292) 2020-02-10 09:57:20,402 - INFO - begin test 2020-02-10 09:57:26,264 - INFO - MAE 297.191 2020-02-10 09:57:26,276 - INFO - best MAE 297.191
Here is how it used to be
2019-06-13 10:54:45,116 - INFO - epoch 0, processed 0 samples, lr 0.0000001000, momentum 0.9500000000 2019-06-13 10:54:45,667 - INFO - Epoch: [0][0/960] Time 0.551 (0.551) Data 0.015 (0.015) Loss 195.5461 (195.5461) 2019-06-13 10:55:09,175 - INFO - Epoch: [0][150/960] Time 0.084 (0.159) Data 0.015 (0.014) Loss 119.7359 (333.7037) 2019-06-13 10:55:34,399 - INFO - Epoch: [0][300/960] Time 0.208 (0.164) Data 0.015 (0.014) Loss 178.8816 (275.7937) 2019-06-13 10:56:00,454 - INFO - Epoch: [0][450/960] Time 0.203 (0.167) Data 0.015 (0.014) Loss 61.6186 (247.2942) 2019-06-13 10:56:24,246 - INFO - Epoch: [0][600/960] Time 0.133 (0.165) Data 0.010 (0.014) Loss 236.0601 (249.8360) 2019-06-13 10:56:49,196 - INFO - Epoch: [0][750/960] Time 0.189 (0.165) Data 0.011 (0.014) Loss 272.6378 (268.4480) 2019-06-13 10:57:12,888 - INFO - Epoch: [0][900/960] Time 0.224 (0.164) Data 0.019 (0.014) Loss 77.9398 (271.7229) 2019-06-13 10:57:17,520 - INFO - Epoch: [0][930/960] Time 0.208 (0.164) Data 0.015 (0.014) Loss 45.8458 (270.8347) 2019-06-13 10:57:22,781 - INFO - begin test 2019-06-13 10:57:26,680 - INFO - MAE 130.738 2019-06-13 10:57:26,680 - INFO - best MAE 130.738
Just to get an idea, my I know what is your training time on ShanghaiTech PartA?
Thank you
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/leeyeehoo/CSRNet-pytorch/issues/73?email_source=notifications&email_token=ACX7H4WPVRHHJKIPTNDKCO3RFYICBA5CNFSM4LA7LDJKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISJJDSQ, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACX7H4XYGXWKZG5S6DI5CHDRFYICBANCNFSM4LA7LDJA.
4 workers. I have also timed the training code,
img = img.cuda()
img = Variable(img)
output = model(img)
target = target.type(torch.FloatTensor).unsqueeze(0).cuda()
target = Variable(target)
loss = criterion(output, target)
losses.update(loss.item(), img.size(0))
optimizer.zero_grad()
loss.backward()
optimizer.step()
I have used perf_counter to measure the time needed for each line of above code and realized that backpropagation takes ~10x of the time needed for forward pass. I know that backprop is expected to take longer than forward pass, but I believe this difference is a bit too much.
The tests
Thinking that it might be an infrastructural problem, I also have ran my code at colab environment. The result was more or less the same. Here are the timings of training steps for first 7 images from the first epoch of both trainings.
colab local
forward backward forward backward
0.077s 0.818s 0.071s 0.699s
0.025s 0.238s 0.095s 1.043s
0.071s 0.751s 0.066s 0.751s
0.046s 0.486s 0.063s 0.685s
0.073s 0.778s 0.060s 0.619s
0.039s 0.385s 0.015s 0.139s
0.077s 0.818s 0.019s 0.176s
May I know how much time it takes for you to train on ShanghaiTech Part A?
Mine is currently between 45-52 hours, which was between 15-18 hours when I started my research.
Can it be your preprocessing of image?
When I think about it, I would say no, since I have cloned your repository to have a fresh code and used my existing density maps and environments, still no luck. Unfortunately its still slow as seen in the table.
Here is the output of a fresh cloned code on fresh python2 environment. As you can see, training on a single image takes about 0.6 seconds on average.
/home/user/anaconda3/envs/torch_py2/lib/python2.7/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
epoch 0, processed 0 samples, lr 0.0000001000
Epoch: [0][0/960] Time 0.622 (0.622) Data 0.025 (0.025) Loss 782.0785 (782.0785))
Epoch: [0][150/960] Time 1.144 (0.610) Data 0.017 (0.017) Loss 139.2142 (403.8265)
Epoch: [0][300/960] Time 0.206 (0.597) Data 0.011 (0.015) Loss 3413.5181 (394.3032)
Epoch: [0][450/960] Time 0.344 (0.609) Data 0.016 (0.014) Loss 584.6290 (384.1994)
Epoch: [0][600/960] Time 0.147 (0.604) Data 0.007 (0.014) Loss 10.1923 (385.8242)
Epoch: [0][750/960] Time 0.352 (0.603) Data 0.015 (0.014) Loss 4906.9761 (374.7677)
Epoch: [0][900/960] Time 0.740 (0.597) Data 0.006 (0.013) Loss 29.9215 (376.6259)
Epoch: [0][930/960] Time 0.567 (0.596) Data 0.003 (0.013) Loss 28.5599 (371.4367)
begin test
* MAE 310.739
* best MAE 310.739
It looks like I was able to solve the problem. Finally I have found out that it was one of the libraries that I have recently updated. I causes a drastic decrease in speed.
2020-03-04 17:53:20,067 - INFO - epoch 0, processed 0 samples, lr 0.0000001000, momentum 0.9500000000
2020-03-04 17:53:21,407 - INFO - Epoch: [0][0/960] Time 1.339 (1.339) Data 0.003 (0.003) Loss 1626.9767 (1626.9767)
2020-03-04 17:53:44,776 - INFO - Epoch: [0][150/960] Time 0.215 (0.164) Data 0.004 (0.003) Loss 3370.8374 (326.4953)
2020-03-04 17:54:09,898 - INFO - Epoch: [0][300/960] Time 0.086 (0.166) Data 0.004 (0.003) Loss 39.0026 (350.3883)
2020-03-04 17:54:33,277 - INFO - Epoch: [0][450/960] Time 0.230 (0.162) Data 0.006 (0.003) Loss 360.8945 (370.3947)
2020-03-04 17:54:58,589 - INFO - Epoch: [0][600/960] Time 0.162 (0.164) Data 0.002 (0.003) Loss 14.5095 (371.6881)
2020-03-04 17:55:24,841 - INFO - Epoch: [0][750/960] Time 0.134 (0.166) Data 0.011 (0.003) Loss 33.2953 (366.6922)
2020-03-04 17:55:30,198 - INFO - Epoch: [0][780/960] Time 0.209 (0.167) Data 0.004 (0.003) Loss 46.6512 (366.9303)
2020-03-04 17:55:54,898 - INFO - Epoch: [0][930/960] Time 0.210 (0.166) Data 0.003 (0.003) Loss 318.1886 (367.5900)
2020-03-04 17:55:59,934 - INFO - begin test
2020-03-04 17:56:05,370 - INFO - * MAE 330.800
2020-03-04 17:56:05,378 - INFO - * best MAE 330.800
I also would like to share the current environment for others who run into same problem.
name: torchold
channels:
- albumentations
- pytorch
- conda-forge
- defaults
dependencies:
- albumentations=0.4.0=py36_0
- geos=3.7.2=he1b5a44_2
- giflib=5.1.4=0
- graphite2=1.3.11=0
- imageio=2.6.1=py36_0
- imgaug=0.3.0=py_0
- jupyter_contrib_core=0.3.3=py_2
- jupyter_contrib_nbextensions=0.5.0=py36_1000
- jupyter_highlight_selected_word=0.2.0=py36_1000
- jupyter_latex_envs=1.4.4=py36_1000
- jupyter_nbextensions_configurator=0.4.0=py36_1000
- jupyterlab=0.35.4=py36_0
- jupyterlab_server=0.2.0=py_0
- libiconv
- libwebp=0.5.2=7
- libxslt=1.1.32=h88dbc4e_2
- lxml=4.2.5=py36hc9114bc_0
- mkl_fft=1.0.10=py36_0
- mkl_random=1.0.2=py36_0
- openblas=0.2.20=8
- pixman=0.34.0=2
- pywavelets=1.1.1=py36hc1659b7_0
- scikit-image=0.14.2=py36hf484d3e_0
- shapely=1.6.4=py36hec07ddf_1006
- tbb=2019.9=hc9558a2_0
- tbb4py=2019.9=py36hc9558a2_0
- x264=20131218=0
- _libgcc_mutex=0.1=main
- astroid=2.3.1=py36_0
- attrs=19.1.0=py36_1
- autopep8=1.4.4=py_0
- backcall=0.1.0=py36_0
- blas=1.0=mkl
- bleach=3.1.0=py36_0
- bokeh=0.12.16=py36_0
- bzip2=1.0.6=h9a117a8_4
- ca-certificates=2019.10.16=0
- cairo=1.14.12=h8948797_3
- certifi=2019.9.11=py36_0
- cffi=1.11.5=py36h9745a5d_0
- click=6.7=py36h5253387_0
- cloudpickle=0.5.3=py36_0
- cudatoolkit=9.0=h13b8566_0
- cudnn=7.1.2=cuda9.0_0
- cycler=0.10.0=py36h93f1223_0
- cytoolz=0.9.0.1=py36h14c3975_0
- dask=0.17.4=py36_0
- dask-core=0.17.4=py36_0
- dbus=1.13.2=h714fa37_1
- decorator=4.4.0=py36_1
- defusedxml=0.6.0=py_0
- distributed=1.21.8=py36_0
- entrypoints=0.3=py36_0
- expat=2.2.5=he0dffb1_0
- ffmpeg=4.0=hcdf2ecd_0
- fontconfig=2.13.0=h9420a91_0
- freeglut=3.0.0=hf484d3e_5
- freetype=2.9.1=h8a8886c_1
- glib=2.56.2=hd408876_0
- gmp=6.1.2=h6c8ec71_1
- gst-plugins-base=1.14.0=hbbd80ab_1
- gstreamer=1.14.0=hb453b48_1
- h5py=2.8.0=py36ha1f6525_0
- harfbuzz=1.8.8=hffaf4a1_0
- hdf5=1.10.2=hba1933b_1
- heapdict=1.0.0=py36_2
- html5lib=1.0.1=py36h2f9c1c0_0
- icu=58.2=h9c2bf20_1
- intel-openmp=2018.0.0=8
- ipykernel=5.1.2=py36h39e3cac_0
- ipython=7.8.0=py36h39e3cac_0
- ipython_genutils=0.2.0=py36_0
- ipywidgets=7.5.1=py_0
- isort=4.3.21=py36_0
- jasper=2.0.14=h07fcdf6_1
- jedi=0.15.1=py36_0
- jinja2=2.10.1=py36_0
- jpeg=9b=h024ee3a_2
- jsonschema=3.0.2=py36_0
- jupyter=1.0.0=py36_7
- jupyter_client=5.3.3=py36_1
- jupyter_console=6.0.0=py36_0
- jupyter_core=4.5.0=py_0
- kiwisolver=1.0.1=py36h764f252_0
- lazy-object-proxy=1.4.2=py36h7b6447c_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=hd88cf55_4
- libgcc=7.2.0=h69d50b8_2
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran=3.0.0=1
- libgfortran-ng=7.2.0=hdf63c60_3
- libglu=9.0.0=hf484d3e_1
- libopencv=3.4.2=hb342d67_1
- libopus=1.2.1=hb9ed12e_0
- libpng=1.6.37=hbc83047_0
- libprotobuf=3.5.2=h6f1eeef_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- libtiff=4.0.9=he85c1e1_1
- libuuid=1.0.3=h1bed415_2
- libvpx=1.7.0=h439df22_0
- libxcb=1.13=h1bed415_1
- libxml2=2.9.8=hf84eae3_0
- locket=0.2.0=py36h787c0ad_1
- markupsafe=1.1.1=py36h7b6447c_0
- matplotlib=2.2.2=py36hb69df0a_2
- mccabe=0.6.1=py36_1
- mistune=0.8.4=py36h7b6447c_0
- mkl=2018.0.3=1
- msgpack-python=0.5.6=py36h6bb024c_0
- nbconvert=5.6.0=py36_1
- nbformat=4.4.0=py36_0
- nccl=1.3.5=cuda9.0_0
- ncurses=6.1=hf484d3e_0
- networkx=2.1=py36_0
- ninja=1.8.2=py36h6bb024c_1
- notebook=6.0.1=py36_0
- numpy=1.15.4=py36h1d66e8a_0
- numpy-base=1.15.4=py36h81de0dd_0
- olefile=0.45.1=py36_0
- opencv=3.4.2=py36h6fd60c2_1
- openssl=1.1.1d=h7b6447c_3
- packaging=17.1=py36_0
- pandas=0.23.0=py36h637b7d7_0
- pandoc=1.19.2.1=hea2e7c5_1
- pandocfilters=1.4.2=py36_1
- parso=0.5.1=py_0
- partd=0.3.8=py36h36fd896_0
- pcre=8.42=h439df22_0
- pexpect=4.7.0=py36_0
- pickleshare=0.7.5=py36_0
- pillow=5.4.1=py36h34e0f95_0
- pip=19.2.3=py36_0
- prometheus_client=0.7.1=py_0
- prompt_toolkit=2.0.9=py36_0
- psutil=5.4.5=py36h14c3975_0
- ptyprocess=0.6.0=py36_0
- py-opencv=3.4.2=py36hb342d67_1
- pycodestyle=2.5.0=py36_0
- pycparser=2.18=py36hf9f622e_1
- pygments=2.4.2=py_0
- pylint=2.4.2=py36_0
- pyparsing=2.2.0=py36hee85983_1
- pyqt=5.6.0=py36h22d08a2_6
- pyrsistent=0.15.4=py36h7b6447c_0
- python=3.6.9=h265db76_0
- python-dateutil=2.8.0=py36_0
- pytorch=0.4.0=py36hdf912b8_0
- pytz=2018.4=py36_0
- pyyaml=3.12=py36hafb9ca4_1
- pyzmq=17.1.2=py36h14c3975_0
- qt=5.6.3=h8bf5577_3
- qtconsole=4.5.5=py_0
- readline=7.0=h7b6447c_5
- scikit-learn=0.19.1=py36h7aa7ec6_0
- scipy=1.1.0=py36hfa4b5c9_1
- send2trash=1.5.0=py36_0
- setuptools=41.2.0=py36_0
- simplegeneric=0.8.1=py36_2
- sip=4.18.1=py36hf484d3e_2
- six=1.12.0=py36_0
- sortedcontainers=1.5.10=py36_0
- sqlite=3.29.0=h7b6447c_0
- tblib=1.3.2=py36h34cf8b6_0
- terminado=0.8.2=py36_0
- testpath=0.4.2=py36_0
- tk=8.6.8=hbc83047_0
- toolz=0.9.0=py36_0
- tornado=6.0.3=py36h7b6447c_0
- traitlets=4.3.2=py36_0
- typed-ast=1.4.0=py36h7b6447c_0
- wcwidth=0.1.7=py36_0
- webencodings=0.5.1=py36_1
- wheel=0.33.6=py36_0
- widgetsnbextension=3.5.1=py36_0
- wrapt=1.11.2=py36h7b6447c_0
- xz=5.2.4=h14c3975_4
- yaml=0.1.7=had09818_2
- zeromq=4.2.5=h439df22_0
- zict=0.1.3=py36h3a3bf81_0
- zlib=1.2.11=h7b6447c_3
- cuda90=1.0=h6433d27_0
- torchvision=0.2.1=py36_1
@BedirYilmaz @leeyeehoo Sorry for my comment on this, but I don't know how to config the CSRNet for training TRANCOS dataset to get performance like in the paper, could you have any things mentions for me? thanks
Hi,
I have been suffering from serious slow-downs since two weeks now and I was not able to pinpoint the problem yet.
Here is how much time my machine (Cuda 10, Nvidia GTX 1080 Ti, PyTorch 1.4) requires during training currently
Here is how it used to be
Just to get an idea, my I know what is your training time on ShanghaiTech PartA?
Thank you