poodarchu commented 6 years ago

when I tried to evalute the trained model, I got: python ./pytorch/train.py evaluate --config_path=./configs/car.config --model_dir=./data/models /home/users/benjin.zhu/data2/libs/anaconda3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds) /home/users/benjin.zhu/data2/libs/anaconda3.6/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters [ 11 400 352] Restoring parameters from data/models/voxelnet-0.tckpt remain number of infos: 3769 Generate output labels... [1] 20341 segmentation fault python ./pytorch/train.py evaluate --config_path=./configs/car.config

Then I modify convolution.py and submanifoldConvolution.py as the README, I got: RuntimeError: Error(s) in loading state_dict for VoxelNet: size mismatch for middle_feature_extractor.middle_conv.0.weight: copying a param of torch.Size([27, 128, 64]) from checkpoint, where the shape is torch.Size([3456, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.2.weight: copying a param of torch.Size([3, 64, 64]) from checkpoint, where the shape is torch.Size([192, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.4.weight: copying a param of torch.Size([27, 64, 64]) from checkpoint, where the shape is torch.Size([1728, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.6.weight: copying a param of torch.Size([27, 64, 64]) from checkpoint, where the shape is torch.Size([1728, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.8.weight: copying a param of torch.Size([3, 64, 64]) from checkpoint, where the shape is torch.Size([192, 64]) in current model.

traveller59 commented 6 years ago

the sparseconvnet file modification only needed when you evaluate pretrained models. you Need to use tools like pdb or gdb to locate the segfault location, i can't solve this with only segfault information.

poodarchu commented 6 years ago

I still got "(gdb) backtrace

0 0x00007fff341d10df in pybind11::cpp_function::dispatcher(_object, _object, _object*) ()

from /home/users/benjin.zhu/data2/second/second.pth/second/core/non_max_suppression/nms.so

1 0x0000555555662b94 in _PyCFunction_FastCallDict ()

2 0x00005555556f267c in call_function ()

3 0x0000555555714cba in _PyEval_EvalFrameDefault ()

4 0x00005555556ec70b in fast_function ()"

poodarchu commented 6 years ago

I've fixed the above problem. But now I got trouble with sparseconvnet, after I compile it with gcc 5.4 and pytorch 1.0, it said: Installed /data-sdb/benjin.zhu/libs/anaconda3.6/lib/python3.6/site-packages/sparseconvnet-0.2-py3.6-linux-x86_64.egg Processing dependencies for sparseconvnet==0.2 Finished processing dependencies for sparseconvnet==0.2

but then I tried python examples/hello-world.py, I got $ python examples/hello-world.py Traceback (most recent call last): File "examples/hello-world.py", line 30, in <module> input = scn.InputBatch(2, inputSpatialSize) File "/data-sdb/benjin.zhu/second/SparseConvNet/sparseconvnet/inputBatch.py", line 19, in __init__ self.metadata = Metadata(dimension) File "/data-sdb/benjin.zhu/second/SparseConvNet/sparseconvnet/metadata.py", line 17, in Metadata return getattr(sparseconvnet.SCN, 'Metadata_%d' % dim)() AttributeError: module 'sparseconvnet.SCN' has no attribute 'Metadata_2'

This stucks me for a long time.

traveller59 commented 6 years ago

I have no idea, but someone asked me previously:

The Metadata_3 problem is caused by wrong import of sparseconvnet.
Actually I shouldn't add the path of SparseConvNet to PYTHONPATH, which will cause the wrong import of sparseconvnet.

I am using older sparseconvnet edf89af339ee929d9416f3509ff405450949f606 with pytorch 0.4.1.

poodarchu commented 6 years ago

I am using older sparseconvnet edf89af339ee929d9416f3509ff405450949f606 with pytorch 0.4.1.

That's very userful .

TWJianNuo commented 5 years ago

You forgot to change corresponding documents in Sparsenet as instructed in readme.

songanz commented 5 years ago

I have no idea, but someone asked me previously:
The Metadata_3 problem is caused by wrong import of sparseconvnet.
Actually I shouldn't add the path of SparseConvNet to PYTHONPATH, which will cause the wrong import of sparseconvnet.
I am using older sparseconvnet edf89af339ee929d9416f3509ff405450949f606 with pytorch 0.4.1.

In the readme file, you asked that we need to use pytorch 1.0. So do we need to downgrade to pytorch 0.4.1 and install sparseconvnet ?

traveller59 commented 5 years ago

@songanz The SparseConvNet is deprecated in newest code. you need to use spconv instead.

defypp commented 5 years ago

I still got "(gdb) backtrace

0 0x00007fff341d10df in pybind11::cpp_function::dispatcher(_object, _object, _object*) ()

from /home/users/benjin.zhu/data2/second/second.pth/second/core/non_max_suppression/nms.so

1 0x0000555555662b94 in _PyCFunction_FastCallDict ()

2 0x00005555556f267c in call_function ()

3 0x0000555555714cba in _PyEval_EvalFrameDefault ()

4 0x00005555556ec70b in fast_function ()"

== I debug my signal sigsegev, Segment falut error, and got same output with you. could you tell us how to solve this problem? I use torch1.0 + cuda 9.0 +gcc 4.9.2.

traveller59 / second.pytorch

error evaluate #20

0 0x00007fff341d10df in pybind11::cpp_function::dispatcher(_object, _object, _object*) ()

1 0x0000555555662b94 in _PyCFunction_FastCallDict ()

2 0x00005555556f267c in call_function ()

3 0x0000555555714cba in _PyEval_EvalFrameDefault ()

4 0x00005555556ec70b in fast_function ()"

0 0x00007fff341d10df in pybind11::cpp_function::dispatcher(_object, _object, _object*) ()

1 0x0000555555662b94 in _PyCFunction_FastCallDict ()

2 0x00005555556f267c in call_function ()

3 0x0000555555714cba in _PyEval_EvalFrameDefault ()

4 0x00005555556ec70b in fast_function ()"