iisys-hof / map-matching-2

High Performance Map Matching with Markov Decision Processes (MDPs) and Hidden Markov Models (HMMs).
GNU Affero General Public License v3.0
40 stars 8 forks source link

File damaged after using --network-output #3

Closed KrasnovPavel closed 2 years ago

KrasnovPavel commented 2 years ago

Hello!

I again encounter some problems :) I've tried to transform network's srs and export it to file for other uses. I used latest docker image with command:

./map_matching_2 \
  --network "/app/data/4seasons.osm.pbf" \
  --network-transform-srs "+proj=utm +zone=32" \
  --network-output "/app/data/4seasons_utm.osm.pbf" \
  --tracks-srs "+proj=utm +zone=32" \
  --readline \
  --verbose
Enabled network output.

Start Preparation ...
Import network ... done in 3.52163s
Graph has 320719 vertices and 629287 edges
Simplifying ... done in 0.911626s
Simplified graph has 89211 vertices and 205315 edges
Transform graph ... done in 1.19944s
Transformed graph has 89211 vertices and 205315 edges
Baking graph ... done in 0.132387s
Baked graph has 89211 vertices and 205315 edges
Saving osm file ... done in 0.655331s
Building spatial indices ... done in 0.129311s
Please input tracks one per line, either as LINESTRING or comma separated sequence of POINT (enter empty line for stopping):

Number of tracks: 0
Please input ground truth tracks for comparison one per line, either as LINESTRING or comma separated sequence of POINT (enter empty line for stopping):

Number of ground truth tracks for comparison: 0
Finished Preparation after 8.93809 seconds.

but didn't get output file 4seasons_utm.osm.pbf. Instead I have changed 4seasons.osm.pbf. 4seasons.osm.pbf before and after. Moreover now I can't use this new file as input because I am getting errors:

./map_matching_2 \
  --network "/app/data/4seasons.osm.pbf" \
  --network-srs "+proj=utm +zone=32" \
  --tracks-srs "+proj=utm +zone=32" \
  --readline \
  --verbose
Start Preparation ...
Import network ... terminate called after throwing an instance of 'osmium::invalid_location'
  what():  invalid location
Aborted

I hope it can be easily fixed. Thanks!

P.S. Btw maybe you can add option like --export-only? So users will be able to transform network without setting fake tracks data.

addy90 commented 2 years ago

Hi @KrasnovPavel

thank you for finding new bugs in my software!

Unfortunately the osm file format does not support cartesian coordinates, see here, only WGS84 coordinates can be set: https://github.com/osmcode/libosmium/blob/c7f136fb9c6df1c98c166944f0ab9673683e47a8/include/osmium/osm/location.hpp#L331

I accidentially added a cartesian export method to my main method (and I used the wrong output file, see here I used the import file in the cartesian situation, this is why your import file was overriden). This cannot work as the osm file format does not support cartesian coordinates and so the coordinates were out of bounds. In other words my export method created a corrupt file. And this file could not be imported any more because the coordinates were invalid (out of bounds).

I removed the non-working code and give out a warning instead. It should be up to the user to export a network file only in the right coordinate system so I don't want to include too strict tests.

What should be allowed is importing a cartesian network (for example in arcs-nodes mode or other future import modes) and transform it to WGS84 (or leave it if it is already in WGS84) to export it to a osm.pbf file. The idea was to be able to transform the map matching dataset to osm files that I linked in the README. For everything else the network output is meant for exporting what was read into the graph, for example after retaining only the largest subgraph or after removing buildings and stuff from the osm.pbf file that contains all of this. So the outputted osm file is smaller as it contains no unnecessary data, only what was imported. So it is built from the current network graph freshly. In any case the network needs to be in geographic format for being able to output it into the osm format. What is still possible is to export the network into the csv format in any coordinate system for viewing it in QGIS. In QGIS it can then also be exported into other formats such as Shapefile or GeoPackage if needed. So the export-network parameters are different from the network-output here as the csv format is indifferent concerning the SRS but the osm file format is not. I hope this explains a little bit my intentions.

So concerning your data, see the following for example:

./map_matching_2 \
    --network "/app/data/4seasons.osm.pbf" \
    --network-output "/app/data/4seasons_graph.osm.pbf" \
    --verbose

Enabled network output.

Start Preparation ...
Import network ... done in 3.22105s
Graph has 320719 vertices and 629287 edges
Simplifying ... done in 1.01816s
Simplified graph has 89211 vertices and 205315 edges
Baking graph ... done in 0.199019s
Baked graph has 89211 vertices and 205315 edges
Saving osm file ... done in 0.634499s
Building spatial indices ... done in 0.647489s
Finished Preparation after 6.04694 seconds.

Process finished with exit code 0

After this we have two files:

37M    4seasons.osm.pbf
4,2M   4seasons_graph.osm.pbf

The outputted file is much smaller as only the imported graph is outputted without any buildings and so on. In some cases the nodes and edges are not the same amount as before, I have not figured out exactly in which places this happenes and if it is problematic. I will test this further in future, for example I could compare the export-network csv files and look for differences or I have to try something else. I did not have any matching problems from this yet but it is not exactly expected.

You could also see that in WGS84 format no tracks have to be specified for network-output. It is intended to be able to export the network without specifying tracks but it did not work in your case because you transformed the network into a cartesian coordinate system. Instead of specifying readline mode it should also be possible to trick the program by simply specifying tracks-srs or tracks-transform-srs but I have not tried this out.

Nevertheless when you use the outputted file now it is imported a little bit faster as you can see because only the necessary data remains which means it has not to be skipped again as in the original osm file. But you have to transform now to UTM when you want to use UTM tracks:

./map_matching_2 \
    --network "/app/data/4seasons_graph.osm.pbf" \
    --network-transform-srs "+proj=utm +zone=32" \
    --tracks-srs "+proj=utm +zone=32" \
    --readline \
    --verbose

Start Preparation ...
Import network ... done in 2.23767s
Graph has 320719 vertices and 629287 edges
Simplifying ... done in 1.31037s
Simplified graph has 89215 vertices and 205323 edges
Transform graph ... done in 2.23078s
Transformed graph has 89215 vertices and 205323 edges
Baking graph ... done in 0.225789s
Baked graph has 89215 vertices and 205323 edges
Building spatial indices ... done in 0.227087s
Please input tracks one per line, either as LINESTRING or comma separated sequence of POINT (enter empty line for stopping):

You can see the nodes and edges are a little bit different after simplification, as I said I still have to figure out what happenes here exactly. The difference is very small and does not happen for every osm file that I tried but it happenes sometimes as we can see. Maybe it comes from overlapping nodes with the same coordinates but I am not exactly sure yet.

If you want to use the network data in other SRS for other software maybe you should try the --export-transformed-network option, you specify it two times, one time for the nodes.csv and one time for the edges.csv and you can import the edges.csv in QGIS and export it into whatever file format you want (Shapefile, GeoPackage, and what else QGIS supports). Maybe with a Shapefile in UTM format you can continue with the software you want to continue with then.

I hope this is of help for you! Sorry that transformed networks in osm file format don't work but this limitation comes from the format itself as far as I understood. OSM is always in WGS84 by definition.

KrasnovPavel commented 2 years ago

This cannot work as the osm file format does not support cartesian coordinates and so the coordinates were out of bounds.

Oh, I see, looks like i am again using your software not the way it meant to be. :)

If you want to use the network data in other SRS for other software maybe you should try the --export-transformed-network option, you specify it two times, one time for the nodes.csv and one time for the edges.csv and you can import the edges.csv in QGIS and export it into whatever file format you want (Shapefile, GeoPackage, and what else QGIS supports). Maybe with a Shapefile in UTM format you can continue with the software you want to continue with then.

I will definetly try it.

Thank you for your support and explanations!

addy90 commented 2 years ago

Oh, I see, looks like i am again using your software not the way it meant to be. :)

:smile: I am sorry that my software is not so clear as it should be in some cases. As with many research projects, documentation is rare in the beginning as everything is inside the head of the authors and first has to be filled into papers... I am glad that you are helping me in showing me where my software can expand concerning features and explanation! Unfortunately I cannot solve every problem but the discussion helps anyway!

Eventually it will become better and more clear I am sure, there is more developing ongoing but no time plan available.