valhalla / valhalla

Open Source Routing Engine for OpenStreetMap
https://valhalla.github.io/valhalla/
Other
4.35k stars 665 forks source link

valhalla_export_edges for planet seems to be stuck #2426

Open hannojg opened 4 years ago

hannojg commented 4 years ago

Hey, I am trying to generate polylines for the planet. The specs of the server:

64GB RAM
1TB NVMe SSD
AMD Ryzen 7 3700X

I ran the following commands:

# Get Planet
wget https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf

# Prepare valhalla
mkdir -p valhalla_tiles/
valhalla_build_config --mjolnir-tile-dir ${PWD}/valhalla_tiles --mjolnir-tile-extract ${PWD}/valhalla_tiles.tar --mjolnir-timezone ${PWD}/valhalla_tiles/timezones.sqlite --mjolnir-admin ${PWD}/valhalla_tiles/admins.sqlite > valhalla.json

# Generate tiles
valhalla_build_tiles -c valhalla.json planet-latest.osm.pbf

# Tar tiles
find valhalla_tiles | sort -n | tar cf valhalla_tiles.tar --no-recursion -T -

# Extract polylines (this is where it is stuck)
valhalla_export_edges --config valhalla.json > planet_extract.polyline

When executing the last command I get no output. The command is running for 8 hours now with no output. I see however that it pegs the CPU (one core at 99-100%).

I tried the procedure with portland-metro which worked.

Can you tell me whether there is an issue with valhalla or my procedure? Thank you so much in advance

kevinkreiser commented 4 years ago

This doesnt look to be an error in your procedure. The planet used to complete in about 7 minutes. To me it sounds like this thing is stuck in an infinite loop. My first question is did you build this from source or use docker, what version of the code are you running. If its not the latest code please try that.

If it is the latest code we'll need to figure out what data is triggering this. To do that I suspect we'll need to track how the graph expansion algorithm in the code is visiting edges. It shouldnt be possible but it seems to me it must be somehow getting stuck in a loop of the same edges.

Perhaps you can modify the code to remove the normal output of the program but rather the edgeids that have been visited. If this sounds like too much no worries we'll have to mark it as a bug and give you a hand.

hannojg commented 4 years ago

Thanks for getting back at me! I've built this from source, I will retry now with the latest commits. I used this script btw: https://github.com/valhalla/valhalla/blob/master/scripts/Ubuntu_Bionic_Install.sh The version I am running is 3.0.9.

Could you provide me some guidance how to achieve this:

modify the code to remove the normal output of the program but rather the edgeids that have been visited.

Thank you so much in advance!

kevinkreiser commented 4 years ago

@hannojg yeah i would try with master. im fairly certain that script is now redundant (i recently made what i think are the last changes to the code to make it compile out of the box (with the right dependencies) on ubuntu up to 20.04).

try this patch:

diff --git a/src/valhalla_export_edges.cc b/src/valhalla_export_edges.cc
index 4ae1ae9b5..6147345d3 100644
--- a/src/valhalla_export_edges.cc
+++ b/src/valhalla_export_edges.cc
@@ -328,13 +328,13 @@ int main(int argc, char* argv[]) {
         // keep this
         edges.push_front(other);
       }
-
       // get the shape
       std::list<PointLL> shape;
       for (const auto& e : edges) {
-        extend(reader, t, e, shape);
+        //extend(reader, t, e, shape);
+        std::cout << e.i << std::endl;
       }
-
+/*
       // output it as: shape,name,name,...
       auto encoded = encode(shape);
       std::cout << encoded << column_separator;
@@ -342,6 +342,7 @@ int main(int argc, char* argv[]) {
         std::cout << name << (&name == &names.back() ? "" : column_separator);
       }
       std::cout << row_separator;
+*/
       std::cout.flush();
     }

If the code is stuck in a loop of edges you should see a pattern in the output

hannojg commented 4 years ago

I changed src/valhalla_export_edges.cc to what you suggested. I then rebuild the code with these commands:

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
sudo make install

When I then run the command

valhalla_export_edges  --config valhalla.json > planet_extract.polyline

The same as before happens: nothing. No output.

What confuses me is that I then added a simple logging to the first line of the main method, but also this isn't printed to my output. Here are my diffs:

diff --git a/src/valhalla_export_edges.cc b/src/valhalla_export_edges.cc
index 4ae1ae9b5..df49b627c 100644
--- a/src/valhalla_export_edges.cc
+++ b/src/valhalla_export_edges.cc
@@ -154,6 +154,7 @@ void extend(GraphReader& reader,

 // program entry point
 int main(int argc, char* argv[]) {
+  LOG_INFO("Running this valhalla thing.");
   bpo::options_description options("valhalla_export_edges " VALHALLA_VERSION "\n"
                                    "\n"
                                    " Usage: valhalla_export_edges [options]\n"
@@ -332,9 +333,10 @@ int main(int argc, char* argv[]) {
       // get the shape
       std::list<PointLL> shape;
       for (const auto& e : edges) {
-        extend(reader, t, e, shape);
+        //extend(reader, t, e, shape);
+        std::cout << e.i << std::endl;
       }
-
+      /*
       // output it as: shape,name,name,...
       auto encoded = encode(shape);
       std::cout << encoded << column_separator;
@@ -342,8 +344,9 @@ int main(int argc, char* argv[]) {
         std::cout << name << (&name == &names.back() ? "" : column_separator);
       }
       std::cout << row_separator;
+      */
       std::cout.flush();
-    }
+   }

     // check progress
     int procent = (100.f * set) / edge_count;
kevinkreiser commented 4 years ago

@hannojg can you run it with gdb and then stop it and see where the code is? basically revert the changes. then compile it like t his:

rm -rf build
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j$(nproc) valhalla_export_edges

then run it directly from there with gdb

gdb --args valhalla_export_edges your_config.json
#gdb will open up the you need to start the program
run
#let it run for a while, at least a few minutes then press ctl-c to stop the debugger
where
#this will show the current stack copy paste that output to this issue
hannojg commented 4 years ago

One failure in my procedure: I was redirecting the output to a file in which I wanted to save the polyline information, that's why I haven't seen any output:

valhalla_export_edges  --config valhalla.json > planet_extract.polyline

When removing the > I see the info log I added, but nothing happens after this.

When reverting and using gdb as you described I get the following output:

Starting program: /home/pelias/valhalla/build/valhalla_export_edges ../../valhalla.json
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00005555555c2491 in valhalla::midgard::tar::header_t::verify (this=0x7fe71faa9000) at /home/pelias/valhalla/valhalla/midgard/sequence.h:572
572             sum += ((char*)&temp)[i];
(gdb) where
#0  0x00005555555c2491 in valhalla::midgard::tar::header_t::verify (this=0x7fe71faa9000) at /home/pelias/valhalla/valhalla/midgard/sequence.h:572
#1  0x00005555555c2c33 in valhalla::midgard::tar::tar (this=0x5555559db230, tar_file="/home/pelias/valhalla_tiles.tar", regular_files_only=true) at /home/pelias/valhalla/valhalla/midgard/sequence.h:608
#2  0x00005555555b8139 in valhalla::baldr::GraphReader::tile_extract_t::tile_extract_t (this=0x5555559db190, pt=...) at /home/pelias/valhalla/src/baldr/graphreader.cc:31
#3  0x00005555555b9cc6 in valhalla::baldr::GraphReader::get_extract_instance (pt=...) at /home/pelias/valhalla/src/baldr/graphreader.cc:100
#4  0x00005555555bb036 in valhalla::baldr::GraphReader::GraphReader (this=0x7fffffffe210, pt=..., tile_getter=...) at /home/pelias/valhalla/src/baldr/graphreader.cc:349
#5  0x0000555555564737 in main (argc=2, argv=0x7fffffffe408) at /home/pelias/valhalla/src/valhalla_export_edges.cc:211
(gdb) 
kevinkreiser commented 4 years ago

seems your tar is either corrupt or you are using a super slow hard drive. the first thing the code does when you use it with a tar is scan the whole thing to see what tiles are in it. thats what this stack trace is showing.

the next thing to test is if this works with a smaller tar file. can you, instead of using the planet, build with a small country like liechtenstein (check geofabrik for a download)? and then run the extraction procedure. if that finishes then we can say its something about the planet tar or the drive you are reading it from. if that doesnt finish then there is something wrong with your data creation.

hannojg commented 4 years ago

Okay, so the Liechtenstein polyline extract finished within seconds! I think we can exclude the hard drive as it is running on a dedicated server of a notable provider on NVMe SSD.

Would it help if I provide the tar, or the output log during tile generation?

// EDIT: I just ran tar -tvvf valhalla_tiles.tar and it outputs me the whole content list of whats inside the tar, no error.

kevinkreiser commented 4 years ago

@hannojg i think we need to keep debugging this to really see what is going on. the stack above shows that this is just the initial scan of the tar file. one possibility is a misconfiguration with respect to memory mapping on your machine leads to a crazy long time just to loop over the tar. are you comfortable with gdb? if so could you check to see if its getting through any of the files in the tar or if its stuck on just one or what? you'll see there is a loop over the contents of the tar, i would set a breakpoint in there and see if it ever gets around to a second iteration of the loop. or you can just add a log statement there like LOG_WARN("checking header");

specifically im talking about this loop: https://github.com/valhalla/valhalla/blob/master/valhalla/midgard/sequence.h#L603-L624

you'll see there is an if on line 608. maybe its getting stuck in there and just looking at the file one block at a time? if so that would be something like 50000000000/512=97656250 iterations which could take a while...

kevinkreiser commented 4 years ago

also if you are sick of debugging this and just want to move on, you can extract the tar to a directory and configure your tile_dir to point at that and then run valhalla_export_edges that way. the tar configuration is faster but using a directory of tiles will work just fine