zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 450 forks source link

question #127

Open qiulesun opened 5 years ago

qiulesun commented 5 years ago

I use the released code without problem before updating. Now, I notice that the codes are updated newly. So I can not wait to try it, and successfully run the python setup.py install. Then when I run the import encoding, I get the error like this.

>>> import encoding Traceback (most recent call last): File "", line 1, in File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/init.py", line 13, in from . import nn, functions, dilated, parallel, utils, models, datasets, optimizer File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/nn/init.py", line 12, in from .encoding import File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/nn/encoding.py", line 18, in from ..functions import scaled_l2, aggregate, pairwise_cosine File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/functions/init.py", line 2, in from .encoding import File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/functions/encoding.py", line 14, in from .. import lib File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/lib/init.py", line 25, in ], build_directory=gpu_path, verbose=False) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 494, in load with_cuda=with_cuda) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 670, in _jit_compile return _import_module_from_library(name, build_directory) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 753, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/local/anaconda3/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/usr/local/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: /media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/lib/gpu/enclib_gpu.so: undefined symbol: _ZN2at4cuda20getCurrentCUDAStreamEv

zhanghang1989 commented 5 years ago

Could you please pull the most recent version of this package or try:

pip install torch-encoding --upgrade
qiulesun commented 5 years ago

In the table, EncNet_ResNet50_ADE achieves 80.1 pixAcc and 41.5 mIoU (https://hangzhang.org/PyTorch-Encoding/experiments/segmentation.html). However, with more epochs (160 epochs) and much larger input size (base_size 608, crop_size 576), the corresponding log file shows it obtains 78.0 pixAcc and 40.2 mIoU lower than the results reported both in the table and paper.

zhanghang1989 commented 5 years ago
  1. The performance get improved after paper publication.
  2. The log file is out-of-date.
  3. The validation score in the log file only uses single size with center crop, which is used for monitoring the training process. For correct multi-size evaluation, please use test.py
zhanghang1989 commented 5 years ago

For the command reproducing the results, please click the cmd button.

qiulesun commented 5 years ago

the effectiveness of SyncBN I hold the view that you have systematically evaluated the effectiveness of the proposed SyncBN. Could you show ablation results compared with standard BN (or Group Norm if possible) on Imagenet2012 or segmentation datasets you have done ?

zhanghang1989 commented 5 years ago

SyncBN is different from standard BN or group BN, because the other method DO NOT compute across gpu. I don't think syncBN is helpful for batchsize > 16, such as imagenet 2012 training.

qiulesun commented 5 years ago

ablation study SyncBN is outstanding work and I understand its underlying mechanic. However, unfortunately I can't see how much SyncBN works from the CVPR18 paper. Given the same batchsize (e.g., 16), do you have ablation experimental results to illustrate the performance of SyncBN compared with other BN on segmentation dataset?

zhanghang1989 commented 5 years ago

Hi @qiulesun , thanks for interest in this work.

I do have a table in the paper supplementary material for benchmarking SyncBN on Pascal Context dataset:

method BN pixAcc mIoU
FCN (4 GPUs) standard BN 47.7 20.8
FCN fixed BN 72.5 40.5
FCN sync BN 73.4 41.0

Fixed BN means using ImageNet pretrained mean and variance. Note that: fixed BN won't work for ADE20K dataset due to large learning rate.

qiulesun commented 5 years ago

supplementary material The compared results are meaningful. But the supplementary material is not found in your homepage and CVPR 2018 open access (http://openaccess.thecvf.com/CVPR2018_search.py). Would you mind showing me the link to downlond it?

zhanghang1989 commented 5 years ago

The supplementary material consisting of some basic information and experimental studies was provided during the double blind review, but it is not included in the final copy due to not being polished in the writting. I can send you a copy, if you could provide an email address.

qiulesun commented 5 years ago

Thank you! My email address is qiulesun@163.com. This paper (http://bzhou.ie.cuhk.edu.hk/publication/ADE20K_IJCV.pdf) also did the ablation study about various normalization, i,e,. Synchronized BN, Unsynchronized BN and Frozen BN.

qiulesun commented 5 years ago

setting of workers I'm sorry for bother you again. I have no idea to how to set the value of workers in option script.

  1. Should workers be equal to the batch size or to the number of CPU cores in my machine? Or to the number of GPUs? Is there a guideline for assigning workers?
  2. Will you plan to report results of cityscapes dataset?
zhanghang1989 commented 5 years ago
  1. I usually set the number of workers as 16 (same as batch size). But it also depends on your cpu.
  2. I will release the training and test on cityscapes later.
qiulesun commented 5 years ago

trivial question

  1. I want to get results on PContext with backgroud, i,e,. 60 classes. Where do I modify the codes to achieve that? Only change NUM_CLASS from 59 to 60 ? (https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/datasets/pcontext.py#L19)
zhanghang1989 commented 5 years ago

The background IoU is considered as 0. Then the mIoU over 60 classes is equal to mIoU_59 * 59 / 60

qiulesun commented 5 years ago

It is an awesome and developing repo., containing more SOTA methods. Question 1 For the last question, computing mIoU with backgroud, i,e,. 60 classes, fisrt need to get mIoU without backgroud (mIoU_59) as you did in repo, then mIoU_60 is directly equal to mIoU_59 * 59 / 60 and will be slightly weaker than mIoU_59. Do I understand correctly? Question 2 For multisize evaluation, will you consider employing dense crop on feature map rather than on input image? It drastically reduces computational overhead and may further boost the performance. Question 3 Do you consdier the usage of accumulation gradient strategy to update param. due to limited GPU memory (small batchsize)? Question 4 Your work appeals to me. When will you release your cvpr2019 paper Co-occurrent Features in Semantic Segmentation ?

Thank you for your consideration and I am looking forward to your reply.

Yuxiang1995 commented 5 years ago

I use the released code without problem before updating. Now, I notice that the codes are updated newly. So I can not wait to try it, and successfully run the python setup.py install. Then when I run the import encoding, I get the error like this.

>>> import encoding Traceback (most recent call last): File "", line 1, in File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/init.py", line 13, in from . import nn, functions, dilated, parallel, utils, models, datasets, optimizer File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/nn/init.py", line 12, in from .encoding import File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/nn/encoding.py", line 18, in from ..functions import scaled_l2, aggregate, pairwise_cosine File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/functions/init.py", line 2, in from .encoding import File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/functions/encoding.py", line 14, in from .. import lib File "/media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/lib/init.py", line 25, in ], build_directory=gpu_path, verbose=False) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 494, in load with_cuda=with_cuda) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 670, in _jit_compile return _import_module_from_library(name, build_directory) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 753, in _import_module_from_library return imp.load_module(module_name, file, path, description) File "/usr/local/anaconda3/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/usr/local/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: /media/rudycv/SSD500G/pytorch_dir/PyTorch-Encoding-Updated/PyTorch-Encoding/encoding/lib/gpu/enclib_gpu.so: undefined symbol: _ZN2at4cuda20getCurrentCUDAStreamEv

I meet the same error with you ,how do you solve it? Just to get the latest torch-encoding cannot fix it