lsrock1 / maskrcnn_benchmark.cpp

Implementation maskrcnn-benchmark, pytorch c++ frontend
MIT License
81 stars 23 forks source link

Failed to convert the model of python to C++ #16

Closed TensorFlowWangHT closed 4 years ago

TensorFlowWangHT commented 4 years ago

I tried to use the same e2e_faster_rcnn_101_fpn_1x.yaml configuration to convert the model to a C ++ callable torchscript. However, when traversing module.get_attributes () in jit_to_cpp.cpp, a boolean error cannot be assigned to the tensor. I checked the model in python, and indeed there is a Boolean variable (eg. Training). Have you encountered this problem during conversion?

lsrock1 commented 4 years ago

Could you show me the error? libtorch keeps changing. So, api could be changed.

TensorFlowWangHT commented 4 years ago

terminate called after throwing an instance of 'c10::Error' what(): isTensor() INTERNAL ASSERT FAILED at /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/src/lib/libtorch/include/ATen/core/ivalue_inl.h:90, please report a bug to PyTorch. Expected Tensor but got Bool (toTensor at /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/src/lib/libtorch/include/ATen/core/ivalue_inl.h:90) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7fffab618c4a in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/src/lib/libtorch/lib/libc10.so) frame #1: <unknown function> + 0x135f47 (0x555555689f47 in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/out/build/Linux-Debug/run.out) frame #2: <unknown function> + 0x133bcc (0x555555687bcc in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/out/build/Linux-Debug/run.out) frame #3: <unknown function> + 0x3415f (0x55555558815f in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/out/build/Linux-Debug/run.out) frame #4: <unknown function> + 0x2d712 (0x555555581712 in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/out/build/Linux-Debug/run.out) frame #5: __libc_start_main + 0xe7 (0x7fffaa452b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x2d17a (0x55555558117a in /home/fzj/.vs/maskrcnn_benchmark.cpp-master/2e8ef9ac-b9b0-472e-adaa-4e43f32945f1/out/build/Linux-Debug/run.out)

My libtorch version is 1.3 The bug happened on line 20 of jit_to_cpp.cpp. If I set continue here to encounter a Boolean type, then on line 36 of jit_to_cpp.h, when the name of the backbone matches, there will also be a bug that the name does not match.

Do I need to change the default settings in default.cpp when I use the e2e_faster_rcnn_101_fpn_1x.yaml configuration file?

Thank you very much for your patience!

lsrock1 commented 4 years ago

Did you use python_utils/to_jit.py to python to jit model?

TensorFlowWangHT commented 4 years ago

yes,it is successful I debugged in python and saw that the backbone network in the model does have a boolean variable (the variable value is training)

lsrock1 commented 4 years ago

I think torchscript function has changed(track more things than before), i will check it and answer!

TensorFlowWangHT commented 4 years ago

Thank you so much

TensorFlowWangHT commented 4 years ago

What is your version of pytorch? I try to change the version and trace

lsrock1 commented 4 years ago

I am not sure, maybe 0.4

TensorFlowWangHT commented 4 years ago

really? https://github.com/facebookresearch/maskrcnn-benchmark The minimum version of pytorch to be installed is 1.0. Shouldn't pytorch 0.4 be installed?

lsrock1 commented 4 years ago

I tested it june - july, 1.1.0 version should work.

TensorFlowWangHT commented 4 years ago

I tried pytorch version 1.1 and 1.2 are both successful

However, during the traversal of name_buffer () in line 63 of jit_to_cpp.h, there was an error in rpn.anchor_generator.anchors.0 without matching names.

So I added the code at line 123 of jit_to_cpp.cpp, and it can run successfully. The ap obtained after the test is the same as originally expected.

else if (name.find("anchors") != std::string::npos) { new_name = name; return new_name; }

Do you think this is because of a version problem, or have you missed it before? Because when I load the model you provided before, there is rpn.anchor_generator.anchors.0 in the model.

lsrock1 commented 4 years ago

because Anchor is generated by config and not learnable parameter, actually it doesnt have to be loaded from weight. I load it to keep consistency. When i tested it, it worked well. It is weird that making error.

Whatever, thank you for your attention. i think i have to test the code again. If you can, your contribution about anchors is welcome!

TensorFlowWangHT commented 4 years ago

This should be a version issue, for example my libtorch version is caused by 1.3.1.

I hope this issue can help other people in need.

Thank u very much for your help!