Open ylc2580 opened 4 years ago
Please check your MXNet version. This happens generally when you try to load a model from a very old version of MXNet.
On Tue, Dec 10, 2019 at 10:16 AM ylc2580 notifications@github.com wrote:
按照您里面写的步骤,依次运行,运行到python detection_train.py --config config/faster_r50v1_fpn_1x.py这句代码时候程序出错,错误信息如下: load pretrain_model/resnet-v1-50-0000.params Traceback (most recent call last): File "detection_train.py", line 311, in train_net(parse_args()) File "detection_train.py", line 135, in train_net arg_params, aux_params = load_checkpoint(pretrain_prefix, pretrain_epoch) File "/media/ubuntu_data2/02_dataset/Audio_Classification/\u5b89\u88c5mxnet\u4e34\u65f6\u5efa/simpledet/utils/load_model.py", line 31, in load_checkpoint save_dict = mx.nd.load('./pretrain_model/resnet-v1-50-0000.params') File "/root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/ndarray/utils.py", line 175, in load ctypes.byref(names))) File "/root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/base.py", line 254, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [10:00:45] src/ndarray/ndarray.cc:1851: Check failed: fi->Read(data): Invalid NDArray file format Stack trace: [bt] (0) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x56bffb) [0x7f7011024ffb] [bt] (1) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray >, std::vector<std::string, std::allocatorstd::string >*)+0x1d6) [0x7f70137d7756] [bt] (2) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayLoad+0x263) [0x7f7013522fc3] [bt] (3) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f704c11eec0] [bt] (4) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/../../libffi.so.6(fficall+0x22d) [0x7f704c11e87d] [bt] (5) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/ ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypescallproc+0x2ce) [0x7f704c33401e] [bt] (6) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/ ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12a54) [0x7f704c334a54] [bt] (7) python(_PyObject_FastCallKeywords+0x49b) [0x560441c0d19b] [bt] (8) python(_PyEval_EvalFrameDefault+0x52e6) [0x560441c724d6]
terminate called without an active exception
特别说明:我这里的cuda为10.0,和你的不一样。出现这个问题无法训练是我的配置问题吗?还是说其他问题?谢谢
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/270?email_source=notifications&email_token=ABGODH33447CEOP5XDYKX43QX33ZBA5CNFSM4JYWSXR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H7KJVGA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODH5EYMMA5EZIFYRTAC3QX33ZBANCNFSM4JYWSXRQ .
我是安装咱们的步骤来的,以下是mxnet的版本。是否正确或者过低?谢谢。
import mxnet as mx mx.version '1.6.0'
Which pip wheel did you installed?
https://1dv.alarge.space/mxnet_cu101-1.6.0b20190820-py2.py3-none-manylinux1_x86_64.whl
or
https://1dv.alarge.space/mxnet_cu100-1.6.0b20190820-py2.py3-none-manylinux1_x86_64.whl
yes, i installed it by this command "https://1dv.alarge.space/mxnet_cu100-1.6.0b20190820-py2.py3-none-manylinux1_x86_64.whl",but it still had this problem. it so -.-......
Interesting, could you please try another pretrained model?
On Thu, Dec 12, 2019 at 5:31 PM ylc2580 notifications@github.com wrote:
yes, i installed it by this command " https://1dv.alarge.space/mxnet_cu100-1.6.0b20190820-py2.py3-none-manylinux1_x86_64.whl",but it still had this problem. it so -.-......
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/270?email_source=notifications&email_token=ABGODH5QF5ZF2WAWL6VFOVLQYIAGNA5CNFSM4JYWSXR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGWBBRA#issuecomment-564924612, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODH4EUNO5MFRRBPOCTKLQYIAGNANCNFSM4JYWSXRQ .
(1) it is very amazing. i delete all download pretrained model myself and let code download itself , and now it can work.
Faster R-CNN FPN uses less than 5G. How about change the gpu id to utilize the two 1080 and see if the OOM problem persist.
On Fri, Dec 13, 2019 at 10:17 AM ylc2580 notifications@github.com wrote:
(1) it is very amazing. i delete all download pretrained model myself and let code download itself , and now it can work. (2) i see every model you set need seven gpu? my gpu have about 26GB(one k80 and two 1080),it is out memory,why?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/270?email_source=notifications&email_token=ABGODHYSM47X5Y4LZ7RO2GLQYLWFJA5CNFSM4JYWSXR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGYVYEQ#issuecomment-565271570, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGODHYWCRK7OIYZL6ZD2JDQYLWFJANCNFSM4JYWSXRQ .
Refet to https://github.com/TuSimple/simpledet/blob/master/MODEL_ZOO.md
...
# download them yourself in, ~/simpledet/pretrain_model
wget https://1dv.aflat.top/resnet-v1-50-0000.params
wget https://1dv.aflat.top/resnet-v1-101-0000.params
wget https://1dv.aflat.top/resnet-50-0000.params
wget https://1dv.aflat.top/resnet-101-0000.params
wget https://1dv.aflat.top/resnet50_v1b-0000.params
wget https://1dv.aflat.top/resnet101_v1b-0000.params
wget https://1dv.aflat.top/resnet152_v1b-0000.params
wget https://1dv.aflat.top/resnext-101-64x4d-0000.params
wget https://1dv.aflat.top/resnext-101-32x8d-0000.params
wget https://1dv.aflat.top/resnext-152-32x8d-IN5k-0000.params
按照您里面写的步骤,依次运行,运行到python detection_train.py --config config/faster_r50v1_fpn_1x.py这句代码时候程序出错,错误信息如下: load pretrain_model/resnet-v1-50-0000.params Traceback (most recent call last): File "detection_train.py", line 311, in
train_net(parse_args())
File "detection_train.py", line 135, in train_net
arg_params, aux_params = load_checkpoint(pretrain_prefix, pretrain_epoch)
File "/media/ubuntu_data2/02_dataset/Audio_Classification/\u5b89\u88c5mxnet\u4e34\u65f6\u5efa/simpledet/utils/load_model.py", line 31, in load_checkpoint
save_dict = mx.nd.load('./pretrain_model/resnet-v1-50-0000.params')
File "/root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/ndarray/utils.py", line 175, in load
ctypes.byref(names)))
File "/root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/base.py", line 254, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [10:00:45] src/ndarray/ndarray.cc:1851: Check failed: fi->Read(data): Invalid NDArray file format
Stack trace:
[bt] (0) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x56bffb) [0x7f7011024ffb]
[bt] (1) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream, std::vector<mxnet::NDArray, std::allocator > , std::vector<std::string, std::allocator >*)+0x1d6) [0x7f70137d7756]
[bt] (2) /root/anaconda3/envs/python37/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayLoad+0x263) [0x7f7013522fc3]
[bt] (3) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f704c11eec0]
[bt] (4) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f704c11e87d]
[bt] (5) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f704c33401e]
[bt] (6) /root/anaconda3/envs/python37/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12a54) [0x7f704c334a54]
[bt] (7) python(_PyObject_FastCallKeywords+0x49b) [0x560441c0d19b]
[bt] (8) python(_PyEval_EvalFrameDefault+0x52e6) [0x560441c724d6]
terminate called without an active exception
特别说明:我这里的cuda为10.0,和你的不一样。出现这个问题无法训练是我的配置问题吗?还是说其他问题?谢谢