nv-tlabs / GSCNN

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)
https://nv-tlabs.github.io/GSCNN/
Other
915 stars 200 forks source link

Error trying to run code #6

Closed ShreyasSkandanS closed 4 years ago

ShreyasSkandanS commented 4 years ago

`/usr/local/lib/python3.5/dist-packages/torch/nn/modules/loss.py:217: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see https://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details. warnings.warn("NLLLoss2d has been deprecated. " /usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead. warnings.warn(warning.format(ret)) 08-27 20:15:43.742 Using Cross Entropy Loss /usr/local/lib/python3.5/dist-packages/encoding/nn/syncbn.py:149: EncodingDeprecationWarning: encoding.nn.BatchNorm2d is now deprecated in favor of encoding.nn.SyncBatchNorm. .format('BatchNorm2d', SyncBatchNorm.name), EncodingDeprecationWarning) Traceback (most recent call last):

File "train.py", line 380, in main() File "train.py", line 132, in main net = network.get_net(args, criterion) File "/data/code/GSCNN/network/init.py", line 12, in get_net criterion=criterion, trunk=args.trunk) File "/data/code/GSCNN/network/init.py", line 27, in get_model net = net_func(num_classes=num_classes, trunk=trunk, criterion=criterion) File "/data/code/GSCNN/network/gscnn.py", line 233, in init self.gate1 = gsc.GatedSpatialConv2d(32, 32) File "/data/code/GSCNN/my_functionals/GatedSpatialConv.py", line 36, in init False, _pair(0), groups, bias, 'zeros') TypeError: init() takes 11 positional arguments but 12 were given `

I have tried my best to make sure all the necessary libraries are the right versions: absl-py (0.8.0) astor (0.8.0) certifi (2019.6.16) chardet (3.0.4) cycler (0.10.0) decorator (4.4.0) gast (0.2.2) google-pasta (0.1.7) grpcio (1.23.0) h5py (2.9.0) idna (2.8) imageio (2.5.0) joblib (0.13.2) Keras-Applications (1.0.8) Keras-Preprocessing (1.1.0) kiwisolver (1.1.0) Markdown (3.1.1) matplotlib (3.0.3) networkx (2.3) nose (1.3.7) numpy (1.17.1) opencv-python (4.1.0.25) Pillow (6.1.0) pip (9.0.1) protobuf (3.9.1) pyparsing (2.4.2) python-dateutil (2.8.0) PyWavelets (1.0.3) PyYAML (5.1.2) requests (2.22.0) scikit-image (0.15.0) scikit-learn (0.21.3) scipy (1.1.0) setuptools (20.7.0) six (1.12.0) tensorboard (1.14.0) tensorboardX (1.8) tensorflow (1.14.0) tensorflow-estimator (1.14.0) termcolor (1.1.0) torch (1.0.0) torch-encoding (1.0.1) torchvision (0.2.0) tqdm (4.35.0) urllib3 (1.25.3) Werkzeug (0.15.5) wheel (0.29.0) wrapt (1.11.2)

Do you have any suggestions on how to go about fixing this?

Best regards, Shreyas

Tetsujinfr commented 4 years ago

this is strange because the line 36 generating the error from myFunctionals/GatedSpatialConv.py does get passed 11 arguments, and not 12, so all looks to be right despite the error message description. super(GatedSpatialConv2d, self).__init__( in_channels, out_channels, kernel_size, stride, padding, dilation, False, _pair(0), groups, bias, 'zeros')

Did you touch the code in anyway post install?

ShreyasSkandanS commented 4 years ago

Hi, thanks for the prompt response.

No I haven't changed anything in the code. Just pointed the config.py file to where I'm storing the cityscape ran the evaluation code listed in the repo readme.

Best regards, Shreyas

Tetsujinfr commented 4 years ago

quick question: which version of ninja do you have installed? (I do not see it in your list of libraries installed above, but certainly you already had it installed)

ShreyasSkandanS commented 4 years ago

ninja --version says I'm on 1.8.2

Tetsujinfr commented 4 years ago

Curious, how did you choose your torch-encoding version for this project ? (v1.0.1) did you face code runtime errors with more recent version?

ShreyasSkandanS commented 4 years ago

I just naively did a pip3 install torch-encoding. Do you have a version that you recommend?

Tetsujinfr commented 4 years ago

my bad, it seems 1.0.1 is the most recent version of torch-encoding. When trying to run the evaluation, the compilation of the encoding cu extensions failed with the error message Subprocess.CalledProcessError:Command '['ninja','-v']' returned non-zero exit status 1.. I add to follow this fix [https://github.com/zhanghang1989/PyTorch-Encoding/issues/161], so basically editing all the .cu files of the encoding library under my virtual environment python packages folder. You did not have such issue on your side?

ShreyasSkandanS commented 4 years ago

That's strange, I don't have any of those errors with 1.0.1. I even tried with torch-encoding==1.0.0 and had the same problem.

I just tried the same procedure with torch-encoding==0.5.0 and I see the error you mentioned above: "subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1"

I looked at the reference you mentioned above but don't fully understand the solution.

My next question would be, do you recommend using torch-encoding==0.5.0 then? And if so, what fix would you recommend for the above problem.

Best regards, S

ShreyasSkandanS commented 4 years ago

@Tetsujinfr just wanted to say thanks for the help. I was under the impression that you were managing this repo but just noticed that you're trying to get it running too.

Tetsujinfr commented 4 years ago

I used torch encoding 1.0.1 in the end. I had the same compilation errors anyway, so I followed the fix proposed and it worked. If you do not have this issue, then lucky you and keep that version you have currently which the latest and the same I am using anyway. Now, I have reached the step where I need the cityscapes dataset so it will take some time to download (probably a day). Once done I will let you know if I reproduce the error you mentioned above or if it did pass the eval.

Any chance someone from the project team can advise on Shreyas issue?

tovacinni commented 4 years ago

Hi, thanks for your interest in our work.

this is strange because the line 36 generating the error from myFunctionals/GatedSpatialConv.py does get passed 11 arguments, and not 12, so all looks to be right despite the error message description.

This is because self is implicitly passed into the constructor, so including self it gets passed in 12 arguments. As for why this is happening, starting PyTorch v1.1.0, there was another argument added for the constructor for _ConvNd, as per this issue (https://github.com/pytorch/pytorch/pull/17240).

I updated parts of my code for the transition from PyTorch <1.0 to 1.1+, but forgot to update the version numbers on the listed dependencies. I have amended the README to reflect this change, but if you want to use <1.1 then you can just remove the 'zeros' that is passed into the constructor and it should work.