iisys-hof / map-matching-2

High Performance Map Matching with Markov Decision Processes (MDPs) and Hidden Markov Models (HMMs).
GNU Affero General Public License v3.0
40 stars 8 forks source link

Uninformative segmentation fault message #2

Closed KrasnovPavel closed 2 years ago

KrasnovPavel commented 2 years ago

Hello, again!

I took your first example from readme, launched it in docker:

./map_matching_2 \
  --network "data/oberfranken-latest.osm.pbf" \
  --tracks "data/points_anonymized.csv" \
  --delimiter ";" \
  --id "device" --id "subid" \
  --x "lon" --y "lat" \
  --time "timestamp" --time-format "%F %T%Oz" \
  --output "data/matches.csv" \
  --verbose

and made mistake in it. I downloaded and used oberbayern-latest.osm.pbf instead of oberfranken-latest.osm.pbf. So obviously it didn't work, but I expected some message like "Matches was not found" or something, but I've got just "Segmentation fault" message and exit.

Start Preparation ...
Import network ... done in 29.7127s
Graph has 3415701 vertices and 6973660 edges
Simplifying ... done in 13.3145s
Simplified graph has 616925 vertices and 1477208 edges
Baking graph ... done in 1.22351s
Baked graph has 616925 vertices and 1477208 edges
Building spatial indices ... done in 7.66701s
Import tracks ... done in 1.28773s
Number of tracks: 1300
Finished Preparation after 56.4901 seconds.

Start Matching ...
Segmentation fault

I fixed my mistake and everything worked correctly.

Now i want to test my data. I took track data from 4seasons dataset (it was collected near Munich) and add origin point (in UTM32) to all coordinates then rounded it to meters and exclude duplicates for faster matching. filtered.csv

I launched matcher:

./map_matching_2 \
  --network "/app/data/oberbayern-latest.osm.pbf" \
  --network-transform-srs +proj=utm +zone=32 \
  --tracks "/app/data/filtered.csv" \
  --tracks-srs +proj=utm +zone=32 \
  --delimiter " " \
  --no-header --no-id --x 0 --y 1 \
  --output "/app/data/f_matches.csv" \
  --verbose

I didn't expect good results since track is obviously wrong oriented but I again have segmentation fault message and don't know how to debug it. I checked with qgis that coordinates from filtered.csv are laying inside oberbayern-latest.osm.pbf.

Can you, please, make this message more informative? Thanks!

P.S. I tryed to compile your tool for windows and gave up. External dependencies just did not compile. Can you consider using vcpkg for managing them instead of cmake's fetch mechanism? At my expirience vcpkg works well enough for building windows, linux or crossplatform applications.

addy90 commented 2 years ago

Hi @KrasnovPavel,

thank you for testing my tool thoroughly and for finding these bugs and problems!

I looked into it and think I could solve most issues.

Concerning the segmention fault: Unfortunately it is not easy to give a meaningful error message when the software is compiled for Release type with high compiler optimizations for this error as it is a memory access error that emits from the system. What I do is that I compile the software in Debug mode (CMAKE_RELEASE_TYPE) without any compiler optimizations and with debugging information. When I execute the software in this Debug mode, as soon as the segmention fault raises, the debugger halts execution directly at the line (breakpoint on errors) and with the stack trace being involved. As such, I immediately found the line of code that caused the problem, as you can see in my commit bbfecdf. I forgot a test for when no candidates are found. When you imported the wrong region with the given tracks, no candidates could be found for any track. In candidate adoption then this error raised because I forgot the mentioned test for when no candidates were found. I often only have the necessary tests for memory access patterns due to performance optimizations, for example I mostly don't use .at(index) but [index] for array access as I guarantee inbefore that access is allowed. Well except when I forget a necessary check as in this case. Problem solved, pushed and Docker images on DockerHub fixed, too! Thanks for finding this problem! Now the program runs and finds no result for all tracks, as it should be.

Second problem with your filtered.csv data, first thanks for providing the file and the command, this made it easy for me to review the setting!

The problem was that in your case you missed "" around the PROJ4 strings at network-transform-srs and tracks-srs setting, this way the parameter was not correctly parsed because of the spacebar inbetween. So you need the following command:

./map_matching_2 \
  --network "/app/data/oberbayern-latest.osm.pbf" \
  --network-transform-srs "+proj=utm +zone=32" \
  --tracks "/app/data/filtered.csv" \
  --tracks-srs "+proj=utm +zone=32" \
  --delimiter " " \
  --no-header --no-id --x 0 --y 1 \
  --output "/app/data/f_matches.csv" \
  --verbose

You got the segmention fault due to the same problem, no candidate was found which triggered the same error that I fixed above. Now with your old command the program runs but no result is given, as the projection was incorrect.

With the corrected command, I get the following result - the red track goes around multiple times unfortunately (by the way the orange policy arrows you get when you use --export-candidates option): filtered

This is because the filtered.csv in fact is quite difficult to match and goes around that location several times, see here in blue the track that comes from the green points: track

But nevertheless it works as you can see!

By the way, you don't need to filter spatially duplicate points, my tool does this automatically, see here in help.txt

Hope you can continue now with your setting! Concerning the orientation, maybe disabling the azimuth weight by setting it to 0.0 allows for this, without the azimuth into consideration but only lengths, direction changes (and distances maybe) it might work better in this case? Just an idea. I don't know if it really works.

Concerning the build under Windows. Unfortunately, as you have seen, it is not easily doable, which is the first reason. I have currently no plans for making the build under Windows natively possible. But you can build it with the Windows Subsystem for Linux (WSL). When you download for example the Ubuntu LTS Linux image, you can build the tool and run it from the WSL Shell with pointing to files under the mounted Windows folders. So it works under Windows over WSL. I tried this myself, which is the second reason why I currently have no plans for native Windows builds. Nevertheless thank you for informing me about vcpkg, maybe I will have time to look further into it.

In case you build it locally, you don't need -DINSTALL_ALL_SHARED_LIBS=ON in the cmake command. This is only for the Dockerfile because I use a build container and a run container that is smaller but misses the dependencies from the build container.

KrasnovPavel commented 2 years ago

Wow! That was quick response! Thanks!

The problem was that in your case you missed "" around the PROJ4 strings

🤦‍♂️ Silly me :)

Concerning the build under Windows.

I still hope that i can build it as dynamic library. I am going to try it again and will provide PR if i will succeed. I want it as dynamic lib because that way i can use it interactively without network importing every time when I want to try new set of parameters.

addy90 commented 2 years ago

Glad I could help!

I hope you can accomplish something with my code as dynamic library! It should not be too difficult as the code is already modularized and put into static libraries. A difficulty with different parameters is that depending on the setting the network or tracks have to be reimported. But you are right for the matching phase itself it would be nice to be able to try different settings without having to reimport the network and tracks again.

One idea would be to specify settings files instead of (or in addition to) the CLI parameters. When a mechanic is implemented that allows for multiple settings files, it could be possible to loop around the sections network import, track import and map matching and in case a succeeding settings file has different settings the part is re-executed, if the settings are the same, it is skipped (so one could have three settings files, network and track import is always the same and map matching has different settings, then only the map matching phase is re-executed in each loop).

But right now there are no plans for such a setting system or for library usage, but I will note both ideas. In any case if you have a good PR please feel free to suggest it!

One idea that can help you right now: Use a smaller network extract! With the osmium tool and a poly file you can extract the part of the OSM network that you are specifically interested in: https://osmcode.org/osmium-tool/manual.html#creating-geographic-extracts Geofabrik uses the same concept: https://blog.geofabrik.de/?p=397 Maybe this QGIS plugin helps for creating poly files: https://plugins.qgis.org/plugins/osmpoly_export/ Alternatively there exist many standalone scripts for helping with this: https://wiki.openstreetmap.org/wiki/Osmosis/Polygon_Filter_File_Format#Converting_to.2Ffrom_POLY_format

When you use a very small OSM extract you will see that the network import is so fast that you can try out different settings with a loop in a bash script because the network import might be faster than the matching phase then. So you can neglect the network import then. I used this approach during development!

addy90 commented 2 years ago

I did some tests for myself again with different settings and I got the best result for your given points with the settings, might be coincidence:

--mdp-distance-factor 0.5 \
--mdp-length-factor 0.5 \
--mdp-azimuth-factor 0.5 \
--mdp-direction-factor 0.5 \

I tried with putting anything to 0.0 too but it looks like due to shortest paths comparisons and preference for shorter routes to reduce roundtrips the results become quite poor. Maybe I can look further into this use case but at the moment it looks like the points should have the right orientation and nearby positions to the original roads to map well with the default settings. Something GPS recorded tracks guarantee. Which is the data this tool was designed for.

I have an idea why disabling distance and azimuth has not the effect wanted but it is difficult to explain in short. It comes from the design of the matching process that always two adjacent positions are matched (in HMMs as well as in my MDP implementation) but not the whole track is taken into consideration as in typical curve-to-curve matching, so this is more a segment-to-segment matching that is chained to a globally minimum error. I have some workarounds prepared, for example state-size parameter makes the program to a multiple-segment-to-multiple-segment matching but this is extremely expensive in the current design and I think it also does not work correctly right now in the current version, have to further look into it, also it was never designed for this specific use-case. Nevertheless I think the general idea should be possible to be able to find the right roads even when the positions are not in the right orientation and place. I just don't know when I can further look into it but I think about it!