terminate called after throwing an instance of 'torch::jit::script::ErrorReport'

1171257311 commented 4 years ago

Torch version is: 1.1.0 in the past ,i used v1.0.1,every thing is ok.However,when i change pytorch to v1.1.0,when it comes to torch::jit::script::Module module = torch::jit::load("../gcn2_320x240.pt"); it returns : `terminate called after throwing an instance of 'torch::jit::script::ErrorReport' what():
Arguments for call are not valid. The following operator variants are available:

aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> (Tensor): Argument align_corners not provided.

The original call is: at code/gcn.py:67:22 _25 = torch.slice(vu, 0, 0, 9223372036854775807, 1) _26 = torch.select(_25, 1, 0) _27 = torch.slice(vu, 0, 0, 9223372036854775807, 1) _28 = torch.select(_27, 1, 1) _29 = torch.to(_26, dtype=4, layout=0, device=torch.device("cuda:0"), non_blocking=False, copy=False) _30 = torch.to(_28, dtype=4, layout=0, device=torch.device("cuda:0"), non_blocking=False, copy=False) _31 = torch.unsqueeze(torch.index(CONSTANTS.c1, [_29, _30]), 1) ref_points = torch.cat([_24, _31], 1) grid = torch.view(ref_points, [1, 1, -1, 2]) _32 = torch.squeeze(torch.grid_sampler(input, grid, 0, 0))


  desc_1 = torch.t(_32)
  desc_2 = torch.to(torch.gt(desc_1, 0), 6, False, False)
  _33 = ops.prim.NumToTensor(torch.size(desc_2, 0))
  desc_3 = torch.view(desc_2, [int(_33), 32, 8])
  desc_4 = torch.mul(desc_3, CONSTANTS.c2)
  desc = torch.sum(desc_4, [2], False)
  _34 = (_17, torch.to(desc, 0, False, False))
  return _34`
who can help me?

1171257311 commented 4 years ago

could you please release the weight file: .pth rather than the .pt file??then i can transfer it to .pt file. Otherwise,the previous file seem to be will always cause such problem.Thank you so much! @ jiexiong2016

1171257311 commented 4 years ago

@jiexiong2016

ZHN2ZHN commented 4 years ago

please open the .pt file by a compression tool without unzip, and find a file named 'gcn.py' under folder './gcn/code/' , then find the code "_32 = torch.squeeze(torch.grid_sampler(input, grid, 0, 0))" maybe in line67 replace with “_32 = torch.squeeze(torch.grid_sampler(input, grid, 0, 0，True))" .

padmasreenagarajan commented 4 years ago

Hi.. even i faced the same issue with libtorch and it got resolved with nightly version of libtorch.But i dont know why this happens @zhn-svg

ZHN2ZHN commented 4 years ago

Hi.. even i faced the same issue with libtorch and it got resolved with nightly version of libtorch.But i dont know why this happens @zhn-svg

after pytorch1.3.0 , function( torch.grid_sampler ) need five input parameters ,the fifth parameter is True or False.

padmasreenagarajan commented 4 years ago

@zhn-svg ..I recently succeeded in building up YOLOv3 in C++.. The above mentioned error is due to the incompatibility in versions of Pytorch and libtorch. Do check if the .pt file u generated in Pytorch version matches with the libtorch u downloaded from pytorch.org i used pytorch 1.5.1 in Python to generate .pt file and it works well for both libtorch 1.5.1(stable) and preview (nightly)

ZHN2ZHN commented 4 years ago

@padmasreenagarajan ..Yes , This error is caused by incompatibility versions. I mean, it also can be Corrected without changing the libtorch version. above ,the error is： aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> (Tensor): Argument align_corners not provided _32 = torch.squeeze(torch.grid_sampler(input, grid, 0, 0))



in 'pt file' function torch.grid_sampler only four input parameters，but **aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners)** need five.  The fifth parameter **bool align_corners** is missing here. so u onley need to append 'True' or 'False' at last. 
**“_32 = torch.squeeze(torch.grid_sampler(input, grid, 0, 0，True))"**

padmasreenagarajan commented 4 years ago

oh. thats cool. Thank u... Im a very newbie in this. I was using Netron to read .pt files.. but now i got to know another way too/... but could u pls explain why this error got resolved after using correct versions and also how it got resolved after changing this line of code.. and here is my error. could u pls tell me how to rectify it? @zhn-svg

terminate called after throwing an instance of 'torch::jit::script::ErrorReport' what(): Arguments for call are not valid. The following variants are available:

aten::upsample_nearest2d(Tensor self, int[2] output_size) -> (Tensor): Expected at most 2 arguments but found 5 positional arguments.

**aten::upsample_nearest2d.out(Tensor self, int[2] output_size, *, Tensor(a!) out) -> (Tensor(a!)): Argument out not provided.

ZHN2ZHN commented 4 years ago

oh. thats cool. Thank u... Im a very newbie in this. I was using Netron to read .pt files.. but now i got to know another way too/... but could u pls explain why this error got resolved after using correct versions and also how it got resolved after changing this line of code.. and here is my error. could u pls tell me how to rectify it? @zhn-svg

terminate called after throwing an instance of 'torch::jit::script::ErrorReport' what(): Arguments for call are not valid. The following variants are available:

aten::upsample_nearest2d(Tensor self, int[2] output_size) -> (Tensor): Expected at most 2 arguments but found 5 positional arguments.

**aten::upsample_nearest2d.out(Tensor self, int[2] output_size, *, Tensor(a!) out) -> (Tensor(a!)): Argument out not provided.

There is no doubt that the correct version will work，and In different versions, there are different APIs for the same function.I suggest you to read the official documents of the corresponding version

padmasreenagarajan commented 4 years ago

okay..thanks a lot @zhn-svg . i got seg fault after all image detection was done.i.e, output was perfect but got seg fault at the end of main function with LibTorch stable 1.5.1 but got resolved in nightly build. do u have any idea about this?

ZHN2ZHN commented 4 years ago

@padmasreenagarajan sorry.. It is diffcult to know what happened in your code without src. maybe The main function needs to be checked carefully，Especially after the output of network.

padmasreenagarajan commented 4 years ago

@zhn-svg ...yah...its been already addressed in github..ill share the link below.. https://github.com/pytorch/pytorch/issues/38385 could u pls explain why it got resolved with nightly? @zhn-svg

ZHN2ZHN commented 4 years ago

@padmasreenagarajan ok，let me see see.

ZHN2ZHN commented 4 years ago

@padmasreenagarajan i have tested with my libtorch also a stable version1.4.0 : libtorch-cxx11-abi-shared-with-deps-1.4.0.zip. but i I didn't meet error you said. as follow $ ./build/classifier ./resnet50.pt ./label.txt == Switch to GPU mode == ResNet50 loaded! == Label loaded! Let's try it == Input image path: [enter Q to exit] dog.jpg == image size: [768 x 576] == == simply resize: [224 x 224] == ============= Top-1 ============= Label: malamute, malemute, Alaskan malamute With Probability: 47.3742% ============= Top-2 ============= Label: Eskimo dog, husky With Probability: 29.409% ============= Top-3 ============= Label: Siberian husky With Probability: 15.0109% == Input image path: [enter Q to exit] 1.jpg == image size: [899 x 600] == == simply resize: [224 x 224] == ============= Top-1 ============= Label: pier With Probability: 40.1278% ============= Top-2 ============= Label: breakwater, groin, groyne, mole, bulwark, seawall, jetty With Probability: 22.6904% ============= Top-3 ============= Label: suspension bridge With Probability: 3.49556% == Input image path: [enter Q to exit] Q

padmasreenagarajan commented 4 years ago

@zhn-svg actually,the model has not been trained for negative samples. as a temporary fix,the main function has try-catch block for images with neg samples. may be this try-catch is the reason for seg fault?

ZHN2ZHN commented 4 years ago

@padmasreenagarajan .. I think that training without negative samples is more likely to produce bad predictions rather than segfaults. Can you show the source code you wrote? Otherwise it is difficult for me to know which trouble

padmasreenagarajan commented 4 years ago

@zhn-svg Thank you... actually it is confidential and so im unable to share here. as u told,i will train my model with neg samples and then import in C++ environment.

ZHN2ZHN commented 4 years ago

@padmasreenagarajan ...all right, If you really need to deploy in C + + env, I recommend use Caffe which I think is a better choice to train your network. Compared with libtorch, It's much lighter.

padmasreenagarajan commented 4 years ago

yah..will do and update u sooner.. Thank u @zhn-svg

jiexiong2016 / GCNv2_SLAM

terminate called after throwing an instance of 'torch::jit::script::ErrorReport' #44