leeyeehoo / CSRNet-pytorch

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
642 stars 259 forks source link

What is your training time on ShanghaiTech Part A #73

Closed BedirYilmaz closed 4 years ago

BedirYilmaz commented 4 years ago

Hi,

I have been suffering from serious slow-downs since two weeks now and I was not able to pinpoint the problem yet.

Here is how much time my machine (Cuda 10, Nvidia GTX 1080 Ti, PyTorch 1.4) requires during training currently

2020-02-10 09:48:29,344 - INFO - epoch 0, processed 0 samples, lr 0.0000010000, momentum 0.9500000000
2020-02-10 09:48:29,630 - INFO - Epoch: [0][0/960]  Time 0.285 (0.285)  Data 0.022 (0.022)  Loss 25.5455 (25.5455)  
2020-02-10 09:48:43,902 - INFO - Epoch: [0][30/960] Time 0.299 (0.470)  Data 0.116 (0.091)  Loss 246.4199 (297.7094)    
2020-02-10 09:50:04,337 - INFO - Epoch: [0][180/960]    Time 0.704 (0.525)  Data 0.017 (0.058)  Loss 15.9261 (269.7038) 
2020-02-10 09:51:29,218 - INFO - Epoch: [0][330/960]    Time 0.193 (0.543)  Data 0.010 (0.047)  Loss 16.2078 (351.5720) 
2020-02-10 09:52:52,231 - INFO - Epoch: [0][480/960]    Time 0.759 (0.547)  Data 0.009 (0.040)  Loss 2155.6362 (362.7242)   
2020-02-10 09:54:14,097 - INFO - Epoch: [0][630/960]    Time 0.276 (0.546)  Data 0.014 (0.034)  Loss 649.3963 (368.7552)    
2020-02-10 09:55:39,716 - INFO - Epoch: [0][780/960]    Time 0.382 (0.551)  Data 0.013 (0.030)  Loss 222.1103 (364.9167)    
2020-02-10 09:57:03,813 - INFO - Epoch: [0][930/960]    Time 0.735 (0.553)  Data 0.016 (0.028)  Loss 348.5327 (372.7292)    
2020-02-10 09:57:20,402 - INFO - begin test
2020-02-10 09:57:26,264 - INFO -  * MAE 297.191 
2020-02-10 09:57:26,276 - INFO -  * best MAE 297.191

Here is how it used to be

2019-06-13 10:54:45,116 - INFO - epoch 0, processed 0 samples, lr 0.0000001000, momentum 0.9500000000
2019-06-13 10:54:45,667 - INFO - Epoch: [0][0/960]  Time 0.551 (0.551)  Data 0.015 (0.015)  Loss 195.5461 (195.5461)    
2019-06-13 10:55:09,175 - INFO - Epoch: [0][150/960]    Time 0.084 (0.159)  Data 0.015 (0.014)  Loss 119.7359 (333.7037)    
2019-06-13 10:55:34,399 - INFO - Epoch: [0][300/960]    Time 0.208 (0.164)  Data 0.015 (0.014)  Loss 178.8816 (275.7937)    
2019-06-13 10:56:00,454 - INFO - Epoch: [0][450/960]    Time 0.203 (0.167)  Data 0.015 (0.014)  Loss 61.6186 (247.2942) 
2019-06-13 10:56:24,246 - INFO - Epoch: [0][600/960]    Time 0.133 (0.165)  Data 0.010 (0.014)  Loss 236.0601 (249.8360)    
2019-06-13 10:56:49,196 - INFO - Epoch: [0][750/960]    Time 0.189 (0.165)  Data 0.011 (0.014)  Loss 272.6378 (268.4480)    
2019-06-13 10:57:12,888 - INFO - Epoch: [0][900/960]    Time 0.224 (0.164)  Data 0.019 (0.014)  Loss 77.9398 (271.7229) 
2019-06-13 10:57:17,520 - INFO - Epoch: [0][930/960]    Time 0.208 (0.164)  Data 0.015 (0.014)  Loss 45.8458 (270.8347) 
2019-06-13 10:57:22,781 - INFO - begin test
2019-06-13 10:57:26,680 - INFO -  * MAE 130.738 
2019-06-13 10:57:26,680 - INFO -  * best MAE 130.738 

Just to get an idea, my I know what is your training time on ShanghaiTech PartA?

Thank you

leeyeehoo commented 4 years ago

Can it be your preprocessing of image? How many workers you use?


From: Bedir Yılmaz notifications@github.com Sent: Wednesday, March 4, 2020 4:43 PM To: leeyeehoo/CSRNet-pytorch CSRNet-pytorch@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [leeyeehoo/CSRNet-pytorch] What is your training time on ShanghaiTech Part A (#73)

Hi,

I have been suffering from serious slow-downs since two weeks now and I was not able to pinpoint the problem yet.

Here is how much time my machine (Cuda 10, Nvidia GTX 1080 Ti, PyTorch 1.4) requires during training currently

2020-02-10 09:48:29,344 - INFO - epoch 0, processed 0 samples, lr 0.0000010000, momentum 0.9500000000 2020-02-10 09:48:29,630 - INFO - Epoch: [0][0/960] Time 0.285 (0.285) Data 0.022 (0.022) Loss 25.5455 (25.5455) 2020-02-10 09:48:43,902 - INFO - Epoch: [0][30/960] Time 0.299 (0.470) Data 0.116 (0.091) Loss 246.4199 (297.7094) 2020-02-10 09:50:04,337 - INFO - Epoch: [0][180/960] Time 0.704 (0.525) Data 0.017 (0.058) Loss 15.9261 (269.7038) 2020-02-10 09:51:29,218 - INFO - Epoch: [0][330/960] Time 0.193 (0.543) Data 0.010 (0.047) Loss 16.2078 (351.5720) 2020-02-10 09:52:52,231 - INFO - Epoch: [0][480/960] Time 0.759 (0.547) Data 0.009 (0.040) Loss 2155.6362 (362.7242) 2020-02-10 09:54:14,097 - INFO - Epoch: [0][630/960] Time 0.276 (0.546) Data 0.014 (0.034) Loss 649.3963 (368.7552) 2020-02-10 09:55:39,716 - INFO - Epoch: [0][780/960] Time 0.382 (0.551) Data 0.013 (0.030) Loss 222.1103 (364.9167) 2020-02-10 09:57:03,813 - INFO - Epoch: [0][930/960] Time 0.735 (0.553) Data 0.016 (0.028) Loss 348.5327 (372.7292) 2020-02-10 09:57:20,402 - INFO - begin test 2020-02-10 09:57:26,264 - INFO - MAE 297.191 2020-02-10 09:57:26,276 - INFO - best MAE 297.191

Here is how it used to be

2019-06-13 10:54:45,116 - INFO - epoch 0, processed 0 samples, lr 0.0000001000, momentum 0.9500000000 2019-06-13 10:54:45,667 - INFO - Epoch: [0][0/960] Time 0.551 (0.551) Data 0.015 (0.015) Loss 195.5461 (195.5461) 2019-06-13 10:55:09,175 - INFO - Epoch: [0][150/960] Time 0.084 (0.159) Data 0.015 (0.014) Loss 119.7359 (333.7037) 2019-06-13 10:55:34,399 - INFO - Epoch: [0][300/960] Time 0.208 (0.164) Data 0.015 (0.014) Loss 178.8816 (275.7937) 2019-06-13 10:56:00,454 - INFO - Epoch: [0][450/960] Time 0.203 (0.167) Data 0.015 (0.014) Loss 61.6186 (247.2942) 2019-06-13 10:56:24,246 - INFO - Epoch: [0][600/960] Time 0.133 (0.165) Data 0.010 (0.014) Loss 236.0601 (249.8360) 2019-06-13 10:56:49,196 - INFO - Epoch: [0][750/960] Time 0.189 (0.165) Data 0.011 (0.014) Loss 272.6378 (268.4480) 2019-06-13 10:57:12,888 - INFO - Epoch: [0][900/960] Time 0.224 (0.164) Data 0.019 (0.014) Loss 77.9398 (271.7229) 2019-06-13 10:57:17,520 - INFO - Epoch: [0][930/960] Time 0.208 (0.164) Data 0.015 (0.014) Loss 45.8458 (270.8347) 2019-06-13 10:57:22,781 - INFO - begin test 2019-06-13 10:57:26,680 - INFO - MAE 130.738 2019-06-13 10:57:26,680 - INFO - best MAE 130.738

Just to get an idea, my I know what is your training time on ShanghaiTech PartA?

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/leeyeehoo/CSRNet-pytorch/issues/73?email_source=notifications&email_token=ACX7H4WPVRHHJKIPTNDKCO3RFYICBA5CNFSM4LA7LDJKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISJJDSQ, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACX7H4XYGXWKZG5S6DI5CHDRFYICBANCNFSM4LA7LDJA.

BedirYilmaz commented 4 years ago

4 workers. I have also timed the training code,

img = img.cuda()
img = Variable(img)

output = model(img)
target = target.type(torch.FloatTensor).unsqueeze(0).cuda()
target = Variable(target)

loss = criterion(output, target)

losses.update(loss.item(), img.size(0))

optimizer.zero_grad()
loss.backward()
optimizer.step()

I have used perf_counter to measure the time needed for each line of above code and realized that backpropagation takes ~10x of the time needed for forward pass. I know that backprop is expected to take longer than forward pass, but I believe this difference is a bit too much.

The tests

Thinking that it might be an infrastructural problem, I also have ran my code at colab environment. The result was more or less the same. Here are the timings of training steps for first 7 images from the first epoch of both trainings.

colab              local                   
forward  backward  forward  backward                                      
0.077s   0.818s    0.071s   0.699s         
0.025s   0.238s    0.095s   1.043s         
0.071s   0.751s    0.066s   0.751s         
0.046s   0.486s    0.063s   0.685s         
0.073s   0.778s    0.060s   0.619s         
0.039s   0.385s    0.015s   0.139s         
0.077s   0.818s    0.019s   0.176s        
BedirYilmaz commented 4 years ago

May I know how much time it takes for you to train on ShanghaiTech Part A?

Mine is currently between 45-52 hours, which was between 15-18 hours when I started my research.

BedirYilmaz commented 4 years ago

Can it be your preprocessing of image?

When I think about it, I would say no, since I have cloned your repository to have a fresh code and used my existing density maps and environments, still no luck. Unfortunately its still slow as seen in the table.

BedirYilmaz commented 4 years ago

Here is the output of a fresh cloned code on fresh python2 environment. As you can see, training on a single image takes about 0.6 seconds on average.

/home/user/anaconda3/envs/torch_py2/lib/python2.7/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
epoch 0, processed 0 samples, lr 0.0000001000
Epoch: [0][0/960]       Time 0.622 (0.622)      Data 0.025 (0.025)      Loss 782.0785 (782.0785))
Epoch: [0][150/960]     Time 1.144 (0.610)      Data 0.017 (0.017)      Loss 139.2142 (403.8265)
Epoch: [0][300/960]     Time 0.206 (0.597)      Data 0.011 (0.015)      Loss 3413.5181 (394.3032)
Epoch: [0][450/960]     Time 0.344 (0.609)      Data 0.016 (0.014)      Loss 584.6290 (384.1994)
Epoch: [0][600/960]     Time 0.147 (0.604)      Data 0.007 (0.014)      Loss 10.1923 (385.8242)
Epoch: [0][750/960]     Time 0.352 (0.603)      Data 0.015 (0.014)      Loss 4906.9761 (374.7677)
Epoch: [0][900/960]     Time 0.740 (0.597)      Data 0.006 (0.013)      Loss 29.9215 (376.6259)
Epoch: [0][930/960]     Time 0.567 (0.596)      Data 0.003 (0.013)      Loss 28.5599 (371.4367)
begin test
 * MAE 310.739 
 * best MAE 310.739 
BedirYilmaz commented 4 years ago

It looks like I was able to solve the problem. Finally I have found out that it was one of the libraries that I have recently updated. I causes a drastic decrease in speed.

Here is the latest training

2020-03-04 17:53:20,067 - INFO - epoch 0, processed 0 samples, lr 0.0000001000, momentum 0.9500000000
2020-03-04 17:53:21,407 - INFO - Epoch: [0][0/960]      Time 1.339 (1.339)      Data 0.003 (0.003)      Loss 1626.9767 (1626.9767)
2020-03-04 17:53:44,776 - INFO - Epoch: [0][150/960]    Time 0.215 (0.164)      Data 0.004 (0.003)      Loss 3370.8374 (326.4953)
2020-03-04 17:54:09,898 - INFO - Epoch: [0][300/960]    Time 0.086 (0.166)      Data 0.004 (0.003)      Loss 39.0026 (350.3883)
2020-03-04 17:54:33,277 - INFO - Epoch: [0][450/960]    Time 0.230 (0.162)      Data 0.006 (0.003)      Loss 360.8945 (370.3947)
2020-03-04 17:54:58,589 - INFO - Epoch: [0][600/960]    Time 0.162 (0.164)      Data 0.002 (0.003)      Loss 14.5095 (371.6881)
2020-03-04 17:55:24,841 - INFO - Epoch: [0][750/960]    Time 0.134 (0.166)      Data 0.011 (0.003)      Loss 33.2953 (366.6922)
2020-03-04 17:55:30,198 - INFO - Epoch: [0][780/960]    Time 0.209 (0.167)      Data 0.004 (0.003)      Loss 46.6512 (366.9303)
2020-03-04 17:55:54,898 - INFO - Epoch: [0][930/960]    Time 0.210 (0.166)      Data 0.003 (0.003)      Loss 318.1886 (367.5900)
2020-03-04 17:55:59,934 - INFO - begin test

2020-03-04 17:56:05,370 - INFO -  * MAE 330.800 
2020-03-04 17:56:05,378 - INFO -  * best MAE 330.800 

I also would like to share the current environment for others who run into same problem.

name: torchold
channels:
  - albumentations
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - albumentations=0.4.0=py36_0
  - geos=3.7.2=he1b5a44_2
  - giflib=5.1.4=0
  - graphite2=1.3.11=0
  - imageio=2.6.1=py36_0
  - imgaug=0.3.0=py_0
  - jupyter_contrib_core=0.3.3=py_2
  - jupyter_contrib_nbextensions=0.5.0=py36_1000
  - jupyter_highlight_selected_word=0.2.0=py36_1000
  - jupyter_latex_envs=1.4.4=py36_1000
  - jupyter_nbextensions_configurator=0.4.0=py36_1000
  - jupyterlab=0.35.4=py36_0
  - jupyterlab_server=0.2.0=py_0
  - libiconv
  - libwebp=0.5.2=7
  - libxslt=1.1.32=h88dbc4e_2
  - lxml=4.2.5=py36hc9114bc_0
  - mkl_fft=1.0.10=py36_0
  - mkl_random=1.0.2=py36_0
  - openblas=0.2.20=8
  - pixman=0.34.0=2
  - pywavelets=1.1.1=py36hc1659b7_0
  - scikit-image=0.14.2=py36hf484d3e_0
  - shapely=1.6.4=py36hec07ddf_1006
  - tbb=2019.9=hc9558a2_0
  - tbb4py=2019.9=py36hc9558a2_0
  - x264=20131218=0
  - _libgcc_mutex=0.1=main
  - astroid=2.3.1=py36_0
  - attrs=19.1.0=py36_1
  - autopep8=1.4.4=py_0
  - backcall=0.1.0=py36_0
  - blas=1.0=mkl
  - bleach=3.1.0=py36_0
  - bokeh=0.12.16=py36_0
  - bzip2=1.0.6=h9a117a8_4
  - ca-certificates=2019.10.16=0
  - cairo=1.14.12=h8948797_3
  - certifi=2019.9.11=py36_0
  - cffi=1.11.5=py36h9745a5d_0
  - click=6.7=py36h5253387_0
  - cloudpickle=0.5.3=py36_0
  - cudatoolkit=9.0=h13b8566_0
  - cudnn=7.1.2=cuda9.0_0
  - cycler=0.10.0=py36h93f1223_0
  - cytoolz=0.9.0.1=py36h14c3975_0
  - dask=0.17.4=py36_0
  - dask-core=0.17.4=py36_0
  - dbus=1.13.2=h714fa37_1
  - decorator=4.4.0=py36_1
  - defusedxml=0.6.0=py_0
  - distributed=1.21.8=py36_0
  - entrypoints=0.3=py36_0
  - expat=2.2.5=he0dffb1_0
  - ffmpeg=4.0=hcdf2ecd_0
  - fontconfig=2.13.0=h9420a91_0
  - freeglut=3.0.0=hf484d3e_5
  - freetype=2.9.1=h8a8886c_1
  - glib=2.56.2=hd408876_0
  - gmp=6.1.2=h6c8ec71_1
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - h5py=2.8.0=py36ha1f6525_0
  - harfbuzz=1.8.8=hffaf4a1_0
  - hdf5=1.10.2=hba1933b_1
  - heapdict=1.0.0=py36_2
  - html5lib=1.0.1=py36h2f9c1c0_0
  - icu=58.2=h9c2bf20_1
  - intel-openmp=2018.0.0=8
  - ipykernel=5.1.2=py36h39e3cac_0
  - ipython=7.8.0=py36h39e3cac_0
  - ipython_genutils=0.2.0=py36_0
  - ipywidgets=7.5.1=py_0
  - isort=4.3.21=py36_0
  - jasper=2.0.14=h07fcdf6_1
  - jedi=0.15.1=py36_0
  - jinja2=2.10.1=py36_0
  - jpeg=9b=h024ee3a_2
  - jsonschema=3.0.2=py36_0
  - jupyter=1.0.0=py36_7
  - jupyter_client=5.3.3=py36_1
  - jupyter_console=6.0.0=py36_0
  - jupyter_core=4.5.0=py_0
  - kiwisolver=1.0.1=py36h764f252_0
  - lazy-object-proxy=1.4.2=py36h7b6447c_0
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc=7.2.0=h69d50b8_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran=3.0.0=1
  - libgfortran-ng=7.2.0=hdf63c60_3
  - libglu=9.0.0=hf484d3e_1
  - libopencv=3.4.2=hb342d67_1
  - libopus=1.2.1=hb9ed12e_0
  - libpng=1.6.37=hbc83047_0
  - libprotobuf=3.5.2=h6f1eeef_0
  - libsodium=1.0.16=h1bed415_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.0.9=he85c1e1_1
  - libuuid=1.0.3=h1bed415_2
  - libvpx=1.7.0=h439df22_0
  - libxcb=1.13=h1bed415_1
  - libxml2=2.9.8=hf84eae3_0
  - locket=0.2.0=py36h787c0ad_1
  - markupsafe=1.1.1=py36h7b6447c_0
  - matplotlib=2.2.2=py36hb69df0a_2
  - mccabe=0.6.1=py36_1
  - mistune=0.8.4=py36h7b6447c_0
  - mkl=2018.0.3=1
  - msgpack-python=0.5.6=py36h6bb024c_0
  - nbconvert=5.6.0=py36_1
  - nbformat=4.4.0=py36_0
  - nccl=1.3.5=cuda9.0_0
  - ncurses=6.1=hf484d3e_0
  - networkx=2.1=py36_0
  - ninja=1.8.2=py36h6bb024c_1
  - notebook=6.0.1=py36_0
  - numpy=1.15.4=py36h1d66e8a_0
  - numpy-base=1.15.4=py36h81de0dd_0
  - olefile=0.45.1=py36_0
  - opencv=3.4.2=py36h6fd60c2_1
  - openssl=1.1.1d=h7b6447c_3
  - packaging=17.1=py36_0
  - pandas=0.23.0=py36h637b7d7_0
  - pandoc=1.19.2.1=hea2e7c5_1
  - pandocfilters=1.4.2=py36_1
  - parso=0.5.1=py_0
  - partd=0.3.8=py36h36fd896_0
  - pcre=8.42=h439df22_0
  - pexpect=4.7.0=py36_0
  - pickleshare=0.7.5=py36_0
  - pillow=5.4.1=py36h34e0f95_0
  - pip=19.2.3=py36_0
  - prometheus_client=0.7.1=py_0
  - prompt_toolkit=2.0.9=py36_0
  - psutil=5.4.5=py36h14c3975_0
  - ptyprocess=0.6.0=py36_0
  - py-opencv=3.4.2=py36hb342d67_1
  - pycodestyle=2.5.0=py36_0
  - pycparser=2.18=py36hf9f622e_1
  - pygments=2.4.2=py_0
  - pylint=2.4.2=py36_0
  - pyparsing=2.2.0=py36hee85983_1
  - pyqt=5.6.0=py36h22d08a2_6
  - pyrsistent=0.15.4=py36h7b6447c_0
  - python=3.6.9=h265db76_0
  - python-dateutil=2.8.0=py36_0
  - pytorch=0.4.0=py36hdf912b8_0
  - pytz=2018.4=py36_0
  - pyyaml=3.12=py36hafb9ca4_1
  - pyzmq=17.1.2=py36h14c3975_0
  - qt=5.6.3=h8bf5577_3
  - qtconsole=4.5.5=py_0
  - readline=7.0=h7b6447c_5
  - scikit-learn=0.19.1=py36h7aa7ec6_0
  - scipy=1.1.0=py36hfa4b5c9_1
  - send2trash=1.5.0=py36_0
  - setuptools=41.2.0=py36_0
  - simplegeneric=0.8.1=py36_2
  - sip=4.18.1=py36hf484d3e_2
  - six=1.12.0=py36_0
  - sortedcontainers=1.5.10=py36_0
  - sqlite=3.29.0=h7b6447c_0
  - tblib=1.3.2=py36h34cf8b6_0
  - terminado=0.8.2=py36_0
  - testpath=0.4.2=py36_0
  - tk=8.6.8=hbc83047_0
  - toolz=0.9.0=py36_0
  - tornado=6.0.3=py36h7b6447c_0
  - traitlets=4.3.2=py36_0
  - typed-ast=1.4.0=py36h7b6447c_0
  - wcwidth=0.1.7=py36_0
  - webencodings=0.5.1=py36_1
  - wheel=0.33.6=py36_0
  - widgetsnbextension=3.5.1=py36_0
  - wrapt=1.11.2=py36h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - yaml=0.1.7=had09818_2
  - zeromq=4.2.5=h439df22_0
  - zict=0.1.3=py36h3a3bf81_0
  - zlib=1.2.11=h7b6447c_3
  - cuda90=1.0=h6433d27_0
  - torchvision=0.2.1=py36_1
ThanhNhann commented 4 years ago

@BedirYilmaz @leeyeehoo Sorry for my comment on this, but I don't know how to config the CSRNet for training TRANCOS dataset to get performance like in the paper, could you have any things mentions for me? thanks