zhanghang1989 / PyTorch-Encoding

A CV toolkit for my papers.
https://hangzhang.org/PyTorch-Encoding/
MIT License
2.04k stars 450 forks source link

No module named cpp_extension #67

Closed qiulesun closed 6 years ago

qiulesun commented 6 years ago

Hi, I got the error named No module named cpp_extension (from torch.utils.cpp_extension import load) when I run the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package. The version of python and torch are 2.7 and 0.3.1 respectively. How can I handle it?

zhanghang1989 commented 6 years ago

0.3.1 is way too old. Please install PyTorch master branch > 0.5.0

qiulesun commented 6 years ago

The version of python and torch are updated to 3.6 and 0.4.0 respectively. Follow the link you provided https://www.claudiokuenzler.com/blog/756/install-newer-ninja-build-tools-ubuntu-14.04-trusty#.WxYrvFMvzJw, I install ninja 1.8.2. However, when I run again the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package, I got another error. How can I solve it? I believe your papers and code can make me interested in semantic segmentation tasks.

root@hh-Z97X-UD3H:/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master# python quick_demo.py Traceback (most recent call last): File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory) File "/usr/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output **kwargs).stdout File "/usr/anaconda3/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "demo.py", line 2, in import encoding File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/init.py", line 13, in from . import nn, functions, dilated, parallel, utils, models, datasets File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/init.py", line 12, in from .encoding import File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/encoding.py", line 18, in from ..functions import scaledL2, aggregate File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/init.py", line 2, in from .encoding import File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/encoding.py", line 13, in from .. import lib File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/init.py", line 12, in ], build_directory=cpu_path, verbose=False) File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load _build_extension_module(name, build_directory) File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module name, error.output.decode())) RuntimeError: Error building extension 'enclib_cpu': [1/2] c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o FAILED: roi_align_cpu.o c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o In file included from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ArrayRef.h:18:0, from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ScalarType.h:5, from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Scalar.h:11, from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ATen.h:6, from /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:1: /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignForwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, double, int64_t)’: /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.ndimension() == 4); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.ndimension() == 4); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.ndimension() == 2); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.ndimension() == 2); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.size(1) == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.size(1) == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(roi_cols == 4 || roi_cols == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(roi_cols == 4 || roi_cols == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(input.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignBackwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, double, int64_t)’: /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.ndimension() == 2); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.ndimension() == 2); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.size(1) == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.size(1) == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(roi_cols == 4 || roi_cols == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(roi_cols == 4 || roi_cols == 5); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token throw at::Error({func, FILE, LINE}, VA_ARGS) ^ /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’ AT_ERROR(VA_ARGS); \ ^ /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’ AT_ASSERT(bottom_rois.is_contiguous()); ^ ninja: build stopped: subcommand failed.

zhanghang1989 commented 6 years ago

This package depend on a slightly higher version than PyTroch 0.4.0. Please follow the instructions to install pytorch from source https://github.com/pytorch/pytorch#from-source

qiulesun commented 6 years ago

In your paper, the sentence ''The ground truth labels for SE-loss are generated by “unique” operation finding the categories presented in the given ground-truth segmentation mask.'' means that every input image has multiple labels. As far as I know, the binary cross entroy loss can handle binary class or multi-class task rather than multi-labels.

zhanghang1989 commented 6 years ago

I didn’t get the difference between multi class and multi labels. Could you please explain in detail? Btw, the NN already has sigmoid activation

qiulesun commented 6 years ago

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.
Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these. I note that the NN has sigmoid activation. I hold the question that, in your case, the input image has multiple labels or one.

zhanghang1989 commented 6 years ago

The presence of the object categories is indeed a multi-label task. Each category is predicted independently using a binary prediction. I hope it can address your concern.

zhanghang1989 commented 6 years ago

Please refer to the docs for binary cross entropy loss https://pytorch.org/docs/stable/nn.html?highlight=bceloss#torch.nn.BCELoss

qiulesun commented 6 years ago

In binary classification, the number of classes equals 2. The object categories in an input image are more than 2 (figure 2 in paper). So I don't understand why binary cross entropy loss is empolyed and ''Each category is predicted independently using a binary prediction. ''

zhanghang1989 commented 6 years ago

Each category is a binary classification problem. For 150 categories, there 150 individual binary classification problem. I hope this explanation is clear enough. If you still have difficulties, feel free to ask questions in Chinese.

qiulesun commented 6 years ago

Thank you for your patience. Your explanation is clear. The binary cross entropy loss can handle the multi-label classification task. Its target is something like [1,0,0,1,0...]. Sigmoid, unlike softmax don't give probability distribution around NCLASS as output, but independent probabilities.

zhanghang1989 commented 6 years ago

You’re welcome. That is correct.

qiulesun commented 6 years ago

I am really sorroy for disturbing you again. I shouldn't ask the question about installation PyTorch from source, but I have no idea to solve it. Can you help me to fix it out?

System Info:

How you installed PyTorch (conda, pip, source): source Build command you used (if compiling from source): python setup.py install OS: ubuntu14.04 PyTorch version: master Python version: 3.6 CUDA/cuDNN version: cuda8.0+cudnn5.0 GPU models and configuration: GTX1080Ti GCC version (if compiling from source): 4.9.4 CMake version: 3.7.2 ############################################################ Issue description:

3 errors detected in the compilation of "/tmp/tmpxft_00002a14_00000000-7_THCTensorMath.cpp1.ii". CMake Error at caffe2_gpu_generated_THCTensorMath.cu.o.Release.cmake:279 (message): Error generating file /media/hh/pytorch_dir/pytorch/build/caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/./caffe2_gpu_generated_THCTensorMath.cu.o

make[2]: [caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/caffe2_gpu_generated_THCTensorMath.cu.o] Error 1 make[1]: [caffe2/CMakeFiles/caffe2_gpu.dir/all] Error 2 make: *** [all] Error 2 Failed to run 'bash tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-mkldnn nccl caffe2 nanopb libshm gloo THD c10d'

zhanghang1989 commented 6 years ago

Try install the dependencies as following first:

export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]

# Install basic dependencies
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn

# Add LAPACK support for the GPU
conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

You may want to ask on PyTorch repo for further help

qiulesun commented 6 years ago

Are the models you released (model_zoo.py) all trained with two Context Encoding Modules? Can you detail the MS evaluation in the table 1?

models = {
     'encnet_resnet50_pcontext': get_encnet_resnet50_pcontext,
    'encnet_resnet101_pcontext': get_encnet_resnet101_pcontext,
    'encnet_resnet50_ade': get_encnet_resnet50_ade,
    }
zhanghang1989 commented 6 years ago

We only use one Context Encoding Module now, which is more efficient and makes the model compatible with EncNetV2.

qiulesun commented 6 years ago

Can Ubuntu, Mac and Windows os all run the released codes?

zhanghang1989 commented 6 years ago

It mainly depends on the PyTorch. If the pytorch is compiled successfully on your system, there won't be a problem. I am using both Mac and Ubuntu. Note that PyTorch master branch is required.

qiulesun commented 6 years ago

The comand (e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101) for training the model means training resnet101 from scratch or finetuning resnet101?

zhanghang1989 commented 6 years ago

resnet101 is pretrained from ImageNet.

qiulesun commented 6 years ago

I used the comand (CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss) for training the model resnet50. However, when it ran to the epoch12, I stopped it. Next, I restart it and find unluckily it has ran from epoch0 rather than epoch12. What should I do to run it from epoch12?

zhanghang1989 commented 6 years ago

Please resume by adding command --resume path/to/checkpoint.pth.tar

qiulesun commented 6 years ago

Thank you. I have another interest. When does PyTroch 0.4.0 meets the requirements of running released code ?

zhanghang1989 commented 6 years ago

This package won't be compatible with PyTroch 0.4.0, but it will be compatible with next stable release.

qiulesun commented 6 years ago

Question about selayer, why does the selayer have no sigmoid activation function?

(encmodule): EncModule( (encoding): Sequential( (0): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU(inplace) (3): Encoding(N x 512=>32x512) (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): ReLU(inplace) (6): Mean() ) (fc): Sequential( (0): Linear(in_features=512, out_features=512, bias=True) (1): Sigmoid() ) (selayer): Linear(in_features=512, out_features=59, bias=True) )

zhanghang1989 commented 6 years ago

That is the prediction layer for minimizing SE-Loss. The sigmod function is applied during the loss calculation https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/customize.py#L65

qiulesun commented 6 years ago

Sorry for bothering you agian, I have no idea with next errors when I run CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset pcontext --model encnet --aux --se-loss. And import encoding gets similar errors.

OS: ubuntu14.04 Pytorch version: 0.5.0 (from source) Python version: 3.6 CUDA: 8.0 cudnn: 6.0.21 GPU: 2 1080

/usr/local/anaconda3/bin/python3.6 /media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py --dataset PContext --model EncNet --se-loss —————————————————————————————————————————————— Traceback (most recent call last): File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 742, in _build_extension_module ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory) File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output **kwargs).stdout File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py", line 17, in import encoding.utils as utils File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/init.py", line 13, in from . import nn, functions, dilated, parallel, utils, models, datasets File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in from .encoding import File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in from ..functions import scaledL2, aggregate, pairwise_cosine File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in from .encoding import File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in from .. import lib File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/init.py", line 20, in ], build_directory=gpu_path, verbose=False) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 496, in load with_cuda=with_cuda) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 664, in _jit_compile _build_extension_module(name, build_directory) File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 748, in _build_extension_module name, error.output.decode())) RuntimeError: Error building extension 'enclib_gpu': [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o FAILED: roi_align_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_0000662c_00000000-7_roi_align_kernel.cpp1.ii". [2/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o FAILED: encoding_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(315): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(341): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(364): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(391): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006623_00000000-7_encoding_kernel.cpp1.ii". [3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o FAILED: syncbn_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(183): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(217): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(249): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(272): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006627_00000000-7_syncbn_kernel.cpp1.ii". ninja: build stopped: subcommand failed.

Process finished with exit code 1

zhanghang1989 commented 6 years ago

Hi, That is because the PyTorch updates in backend.

  1. Could you change at::Context:: getCurrentCUDAStream to cudaStream_t stream = at::cuda::getCurrentCUDAStream();
  2. Also add #include <ATen/cuda/CUDAContext.h>

This will be fixed in next version.

qiulesun commented 6 years ago

Thanks for your attention. It does work! However, three warnings occur, do that matter?

  1. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1940: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")

  2. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1025: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

  3. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead. warnings.warn(warning.format(ret))

zhanghang1989 commented 6 years ago

The deprecate warning is okay for now.

qiulesun commented 6 years ago

Problem with debugging the backward method of Function class

Hi, aggregate(A, X, C) and scaledL2(X, C, S) in encoding.functions.encoding.py implement the forward and backwark of your custom function. I want to debug their forward and backwark and the pycharm-community-2018.1.4 I used on Ubuntu 16.04 LTS has allowed me debug the forward step by step. However, I could not debug backward function like forward equipped with 2 1080 GPU. Could you tell me is it possilbe and how to address it? (ps: for my own custom functions based on your codes, I also face the same problem)

zhanghang1989 commented 6 years ago

You can directly call the backend function for debugging https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/functions/encoding.py#L77

qiulesun commented 6 years ago

For my special case, I want to run the codes with one GPU (ps: my machine is equipped with 2 GPUs), for example debugging the codes, etc. Do the codes support a single GPU operation even if the machine is equipped with 2 GPUs? Is the default multi GPU running if the machine is equipped with multiple GPUs?

zhanghang1989 commented 6 years ago

CUDA_VISIBLE_DEVICES=0 python train.py ...

qiulesun commented 6 years ago

Question 1 I use pycharm-community-2018.1.4 to make it easier to debug the codes and CUDA_VISIBLE_DEVICES=0 --dataset PContext --model EncNet --se-loss is given in debug configurations. However, I get the error train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0 When I use the pycharm-community-2018.1.4 to debug the codes with a single GPU, I should do what next ?

Connected to pydev debugger (build 181.5087.37) usage: train.py [-h] [--model MODEL] [--backbone BACKBONE] [--dataset DATASET] [--data-folder DATA_FOLDER] [--workers N] [--aux] [--se-loss] [--epochs N] [--start_epoch N] [--batch-size N] [--test-batch-size N] [--lr LR] [--lr-scheduler LR_SCHEDULER] [--momentum M] [--weight-decay M] [--no-cuda] [--seed S] [--resume RESUME] [--checkname CHECKNAME] [--model-zoo MODEL_ZOO] [--ft] [--pre-class PRE_CLASS] [--ema] [--eval] [--no-val] [--test-folder TEST_FOLDER] train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0

Question 2 args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size in option.py means that the lr is relate to batch_size you give. Is that the lr not fixed depending on the batch_size (GPU memory)? In my experiments, I set the args.lr = lrs[args.dataset.lower()], is it reasonable and feasible, does it respect your paper and intentions?

Question 3 For multi-size evaluation, the 27th line base_size=576, crop_size=608 (base_size less than crop_size) in encoding/models/base.py should be base_size=608, crop_size=576? Previously, you set base_size=520, crop_size=480 and now you change them to base_size=576, crop_size=608. I hold the view that crop_size less than base_size seems reasonable. What settings should I follow to reproduce your results?

I am looking forward to your reply.

zhanghang1989 commented 6 years ago

Q1: please use the terminal to launch the program. Q2: That is a kind of standard setting for LR. When increasing the batch size, people typically increase the LR accordingly. Q3. That is a bug. It will be fixed in next release.

qiulesun commented 6 years ago

For the Q2 above, due to the limited GPU memory, the batch size has to be small (typically less than 16) unfortunately. It means that I have to use smaller LR according to the standard setting, i.e., args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size ?

zhanghang1989 commented 6 years ago

Yes. If the batch size is too small, the model will get worse result, because the working batch size for batch normalization is small.

qiulesun commented 6 years ago

I only have 2 1080 GPUs with a total of 16G memory. The batch size is small less than 16 in my experiments. Can I alleviate this side effect (the model will get worse result you said) by using larger LR and set args.lr = lrs[args.dataset.lower()], independent of batch size?

zhanghang1989 commented 6 years ago

The batch size matters for segmentation task, due to working batch size for the Synchronize Batch Normalization. For batch size =16 yields the best performance.

qiulesun commented 6 years ago

What is the main difference between encoding.nn.BatchNorm1d and encoding.nn.BatchNorm2d?

zhanghang1989 commented 6 years ago

same as torch.nn.BatchNorm1d and torch.nn.BatchNorm2d

qiulesun commented 6 years ago

I have two questions. (1) For cos ans poly lr schedules, every batch (iter) has a different lr rather than them in one epoch has same lr. Is that right? (2) For cifar10 recognition, the scaling factor s_k is not learnt but randomly sampling from a uniform distribution between 0 and 1, which is different from segmentation tasks. Is that right?

qiulesun commented 6 years ago

I'm sorry for disturbing you again. Your work is very encouraging to me. I notice that the scaled_l2 and aggregate opertors of the proposed encoding layer are implemented by C++ language. Duo to I am not good at it, could you share the corresponding implementation using python code if you want?

zhanghang1989 commented 6 years ago

We change LR every iter. The cifar experiment use shake-out like regularization. Scaled L2 and aggregate are easy to implement in python, but that will be memory consuming.

qiulesun commented 6 years ago

question 1: Sorry to ask the stupid question. The augmented pascal voc 2012 has 11533 images in trainval.txt rather than 10582 used in paper. It's troubled me. And I do not get the information about how to augment the 1464 trainging images of pascal voc 2012 to result in 10582 ones. In other words, I do not get the relationship between the pascal voc 2012 and its augmented version. Could I fortunately know your opinion? If you think this question is not worth answering, I can understand completely.

question 2: As far as I known, Group norm (https://arxiv.org/pdf/1803.08494.pdf) is independent of batch size, much suitable for semantic segmentation task, which requires small batches constrained by memory consumption. Could you consider employing it in your updated version?

zhanghang1989 commented 6 years ago

Q1. For VOC experiments, first pretrained on COCO, then finetune on "pascal_aug" and finally on "pascal_voc". I am releasing the training detail for reproducing VOC experiments this weekend. Q2. Group Norm still has inferior performance comparing to BN. You can easily use that by changing the code a little bit.

qiulesun commented 6 years ago

Question 1: I see base_size=608 and crop_size=576 in the training log of EncNet_ResNet50_ADE, (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log), however, the base_size and crop_size are set to 520 and 480 respectively in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/datasets/base.py#L17. It's troubled me. Does the special case for ADE20K use base_size=608 and crop_size=576 and use base_size=520 and crop_size=480 for PASCAL Context and PASCAL VOC12 ? Question 2: Besides, base_size=576 and crop_size=608 in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/models/base.py#L27 is only to multiscale test ?

zhanghang1989 commented 6 years ago

There are some bugs in existing code. I am updating them soon.

qiulesun commented 6 years ago

Question 1: As mentioned above, there are some bugs in existing code. I still have a question. The EncNet_ResNet50_ADE achieves 79.9 pixAcc and 41.2 mIoU at the last row in the table (https://hangzhang.org/PyTorch-Encoding/experiments/segmentation.html), however, from the training log file (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log) I see that it obtains 78.0 pixAcc and 40.2 mIoU lower than the results you reported. Is this because you use the multi-scale testing strategy on ADE20K val set? Or something else ?