leoxiaobin / deep-high-resolution-net.pytorch

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
https://jingdongwang2017.github.io/Projects/HRNet/PoseEstimation.html
MIT License
4.31k stars 908 forks source link

training on MPII data.... why #146

Open henbucuoshanghai opened 4 years ago

henbucuoshanghai commented 4 years ago

OPTIMIZER: adam RESUME: False SHUFFLE: True WD: 0.0001 WORKERS: 24 => init weights from normal distribution => loading pretrained model models/pytorch/imagenet/hrnet_w32-36af842e.pth Traceback (most recent call last): File "tools/train.py", line 223, in main() File "tools/train.py", line 111, in main writer_dict['writer'].add_graph(model, (dump_input, )) File "/home/li/.local/lib/python3.6/site-packages/tensorboardX/writer.py", line 566, in add_graph self.file_writer.add_graph(graph(model, input_to_model, verbose)) File "/home/li/.local/lib/python3.6/site-packages/tensorboardX/pytorch_graph.py", line 235, in graph _optimize_trace(trace, torch.onnx.utils.OperatorExportTypes.ONNX) File "/home/li/.local/lib/python3.6/site-packages/tensorboardX/pytorch_graph.py", line 175, in _optimize_trace trace.set_graph(_optimize_graph(trace.graph(), operator_export_type)) File "/home/li/.local/lib/python3.6/site-packages/tensorboardX/pytorch_graph.py", line 201, in _optimize_graph torch._C._jit_pass_lower_alltuples(graph) RuntimeError: kind.is_prim() INTERNAL ASSERT FAILED at /tmp/pip-req-build-p5q91txh/torch/csrc/jit/ir.cpp:904, please report a bug to PyTorch. Only prim ops are allowed to not have a registered operator but aten::_convolution doesn't have one either. We don't know if this op has side effects. (hasSideEffects at /tmp/pip-req-build-p5q91txh/torch/csrc/jit/ir.cpp:904) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7fa05b20f1cd in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: torch::jit::Node::hasSideEffects() const + 0x32d (0x7fa014ead8fd in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so) frame #2: + 0x37a9406 (0x7fa014f17406 in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so) frame #3: + 0x37aa455 (0x7fa014f18455 in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so) frame #4: torch::jit::EliminateDeadCode(torch::jit::Block*, bool, torch::jit::DCESideEffectPolicy) + 0x14b (0x7fa014f18aeb in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so) frame #5: torch::jit::LowerAllTuples(std::shared_ptr&) + 0x2a (0x7fa014f38c9a in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so) frame #6: + 0x455eeb (0x7fa05b6b3eeb in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #7: + 0x1c7506 (0x7fa05b425506 in /home/li/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #37: __libc_start_main + 0xe7 (0x7fa05fa8ab97 in /lib/x86_64-linux-gnu/libc.so.6)
738654805 commented 4 years ago

Try to install tensorboardX = 1.5, good luck. @henbucuoshanghai

henbucuoshanghai commented 4 years ago

same error

henbucuoshanghai commented 4 years ago

I have install it ,but it dont work

738654805 commented 4 years ago

自己多看看吧,这种不是大问题,刚开始都会遇到。

henbucuoshanghai commented 4 years ago

cpu_nms那里也总是出错,gpu那里也不会,我注释了几句gpu的  然后遇到现在问题

不懂怎么办。。。。不会。。

738654805 commented 4 years ago

@henbucuoshanghai 今天我也遇到了类似的问题,你的试试更新一下你的nms文件

henbucuoshanghai commented 4 years ago

在哪更新?

tshrjn commented 4 years ago

Is there an update on this issue? Facing the same error.

onepiece666 commented 4 years ago

请问你解决了这个问题吗

atomtony commented 4 years ago

EasyDict==1.7 opencv-python==3.4.8.29 shapely==1.6.4 Cython scipy pandas pyyaml json_tricks scikit-image yacs>=0.1.5 tensorboardX==1.4

Bennnun commented 4 years ago

I had this trouble and tried different versions of both tensorboarx and torch. I couldn't make it work. Eventually, I removed tensorboardx from the code and I can now train models using the training script.