Closed bochen87 closed 6 years ago
Thanks for this issue. Maybe there is more code dependent on the old invalid index value than we thought. Could you please try with #82 (branch : fix/revertInvalidIndexToZero) ? This way we could be sure that it is indeed the new invalid index value causing your issue.
Really appreciate the fast reply! Will test out first thing tomorrow morning.
Does this issue appear consistently?
Yes, appears consistently. Seems to appear when fast moving dynamic objects appear.
@bochen87 When you evaluate the results of libnabo, it's important to use distance(i, j) != std::numeric_limits<float>::infinity
to check if you have a valid result. It seems you aren't doing this and thus use invalid results in your code.
@simonlynen I'm using the dynamic_mapper.cpp from https://github.com/ethz-asl/ethzasl_icp_mapping/tree/reintegrate/master_into_indigo_devel The GDB stack trace suggests that this code causes the problem: https://github.com/ethz-asl/ethzasl_icp_mapping/blob/ec96db832be5ad4cf95e2551ef33797903a39bbb/ethzasl_icp_mapper/src/dynamic_mapper.cpp#L847
could it be a fix to check whether
tmp_map.features.rows() > 1
Hi @bochen87 yes, from the code it makes sense that this has issues when you don't get readings from the sensors.
For instance here the check if the returned result is valid (i.e. non infinity) is missing; i.e. the code just always uses the results even if there are no points or none of the points falls within max-radius.
@simonlynen According to the stack trace posted by @bochen87 , the crash happens when executing create
and more specifically, when a Node is push_back
ed inside buildNodes
. Nonetheless, I agree with you that icp_mapper code should check for invalid results anyway. However, it seems that the crash is not related to the actual query of the kd-tree but building the tree itself, which indicates that these are separate issues.
@bochen87 An outside check is not needed as it is checked internally. Unfortunately, I cannot reproduce this bug with the provided tests. Would it be possible for you to provide input data that causes the crash?
@magehrig unfortunately, input data is pretty big - we're using velodyne VLP-16. I can try and record something maybe tonight or tomorrow. The crash happens pretty consistently, usually with faster moving objects in front of the vehicle, i.e. humans moving into closer range quickly or bicycles. If you have a setup with a velodyne, it should be easy to reproduce.
@bochen87 I do not have the resources to record my own dataset. We only need the Eigen Matrix that causes the crash. You could use the ofstream class of the fstream header for a quick, although inefficient (not sure if feasible in your case) logging of the matrix:
{
std::ofstream file("tmp-map-features.txt");
if (file.is_open())
file << tmp_map.features;
}
If you put this in front of the create function, you should be able to retrieve the Eigen Matrix.
If you or somebody else has a better approach for debugging this, let me know.
Ok, thanks for providing this more efficient method! I'll try to record something tonight or tomorrow.
@magehrig , I agree, that could be the other position to call into NNS::create.
@bochen87 , if you do the instrumentation with the matrix dump, please do it at both positions.
If the suggested code happens to be too slow, you could use this answer's first function for a binary write : https://stackoverflow.com/a/25389481 (there is no need to put it in the Eigen
namespace. Any other should be fine). It should be much faster. (I assume you are working on x86-64 architecture? Then the binary data should be as easy for us to use.)
Thanks for helping hunting the bug!
Created the files that retrieves the Eigen Matrix, please download here: https://ufile.io/i8mqa
Will do a new one tomorrow for the other section
Yes, I'm working on x86-64 architecture. Two files in binary named after each of the variables. https://drive.google.com/drive/folders/1H1EaBVcUSEGT4iWR2ggMiNjPTvlHhJEJ?usp=sharing
Thanks a lot! I'll have a look.
Reading the matrices worked well. But unfortunately I could not reproduce the issue :(.
@bochen87, are you really writing the matrix directly before the create call? If indeed we might need to start thinking of exotic issues.
I've uploaded the CPP file also into the drive, see line 697 and 857
Basically, if I want to reproduce the problem, I start the robot, give it a goal and jump in front of it suddenly (being a dynamic object).
I just looked at the stack trace again and GDB is signaled from thread 1 (line 210). Unless I am mistaken, we have looked at the wrong part of the code. Libnabo does not appear to be the direct source of this issue because the segfault appears somewhere in libpointmatcher.
Awesome @magehrig ! That's it. Interesting how we all got fooled by the stack traces. Makes much more sense. It probably crashes in here : https://github.com/ethz-asl/libpointmatcher/blob/75044815d40ff934fe0bb7e05ed8bbf18c06493b/pointmatcher/OutlierFiltersImpl.cpp#L253 in case the idRef is now -1 instead of 0.
We should probably "move" this issue over to libpointmatcher .
@magehrig @HannesSommer gosh! 👍 I guess that also explains the behavior happening with "fast moving objects"
Since the latest commits we're having issues with ICP_Mapping in ROS. When reverting back to commit 243a708e84969b8611200dfd68af7f2cd4a49a0b (HEAD~3 currently) it works fine again.
I'm suspecting it has something to do with the commits in 34e2cc0fba58c15f64c24aa3a7c4184235fba1b9
See GDB stacktrace here https://paste.ee/p/var2y