pelias / docker-valhalla-baseimage

Pelias Docker Baseimage with Valhalla additionally installed
3 stars 2 forks source link

Process logs mixed into extract.0sv output #3

Open answerquest opened 7 months ago

answerquest commented 7 months ago

Thank you for this repo.

Describe the bug First lines of generated extract.0sv are having process logs. After 7 lines of logs the expected content starts.

$ head -12 extract.0sv 
2024/02/24 02:36:24.521843 [ERROR] (stat): /data/valhalla/valhalla_tiles.tar No such file or directory
2024/02/24 02:36:24.522029 [WARN] Tile extract could not be loaded
2024/02/24 02:36:24.522087 [WARN] (stat): /data/valhalla/traffic.tar No such file or directory
2024/02/24 02:36:24.522095 [WARN] Traffic tile extract could not be loaded
2024/02/24 02:36:24.546321 [INFO] Enumerating edges...
2024/02/24 02:36:31.550266 [INFO] Exporting 9501786 edges
2024/02/24 02:36:31.552174 [INFO] 0%
saybeBehfpKm_@}\mKoGmGyN_G}LwC_HeG{D{S{]Marten Toonder Sr. pad
{ik`eBadd|JbMOfYwB|VmB`VdGrEtDlI~Tf@|EfCpV`@|SiL`|@eFtKuBlLyArG_Kp]YxYgAvg@tFh~@fOzhIK`EgEtMr@pUnFr\d@hCv]dhQaIjUKfMj@fMbBnCbCpBzDhGxCx~AVpc@fAv^fEb~B{@rVhB`c@`FhxB}FvWYvK@dLtDbyA~Il^`@zn@aCphAdBl[pHl[dW`zNmBzV[lG?vDXhFxBx[jBt^z@jr@InTQzMHfOv@bJdAnG`@jGPlH{AxYwAvJqEtOsArIw@xW[xVDhIdArKn@xT\~AfKdEdEjAzCfAjDnGn@jCMfI]`I]bGDxFTdEWaterstaatpad
}ncqdBemhjLtKmo@rEkXzRrPjJdItCg@hKiBpHoAzUaEcKcr@De Kamp
{|dqdBawijL~G_OtEeK~EsKhHqObCXhOxLxKxIDe Kamp
}xqqdBsdujLbnEumFbSiVj_BimBdhCw|COosterhamriker Maarweg

After this, throughout the file there are lines interspersed with the data for communicating %age completion. We can spot them because they start with the timestamp.

}iudbBmq_{JrCv@de@jMhElAbAbF�Hummeloseweg
ecxdbBskt{JdKnC|QzDnYlHhz@lP~}@t_Bja@lr@fBzClM|SfYre@dSl[bDzF�Ruurlose Allee
2024/02/23 15:58:12.405713 [INFO] 54%
cdruaBeky}I|Aep@PyG`C_i@`A}R|AiZlDeYxD}WjF_YzFwWdBiH�de Chamillylaan

And then finally ending with "Done":

aj~{`BsdfsEnBjBrI`KfMxL�Nieuwstraat�N253
_t~{`BwbgsEmF}JkEiKmI{O~CbMvEbNdGlQ�Commerswerveweg�N253
2024/02/23 15:58:16.367603 [INFO] 100%
2024/02/23 15:58:16.369502 [INFO] Done

My initial guess is (note: haven't inspected the code yet): the script is sending all stdout to the extracted file, and now these progress messages are getting mixed in.

Steps to Reproduce

  1. Clone this repo and build the docker image locally as instructed: docker build -t pelias/valhalla_baseimage .
  2. Change to a separate working folder that is empty.
  3. Download Netherlands pbf to current working folder: wget https://download.geofabrik.de/europe/netherlands-latest.osm.pbf
  4. Generate tiles :
    docker run \
    --rm -it \
    -v './:/data/valhalla' \
    -v './:/data/openstreetmap' \
    pelias/valhalla_baseimage \
    /bin/bash -c './scripts/build_tiles.sh'
  5. Export polylines:
    docker run \
    --rm -it \
    -v './:/data/valhalla' \
    -v './:/data/polylines' \
    pelias/valhalla_baseimage \
    /bin/bash -c './scripts/export_edges.sh'
  6. Now a extract.0sv is created in the working folder, size: 39.0 MiB
  7. Upon inspecting, it is having process logs mixed in with the expected output.

Expected behavior We should get an extract.0sv file without any process log entries mixed in.

Environment (please complete the following information):

Pastebin/Screenshots image image

Additional context No change from the code snippets given on the readme.

References

I'm here because when running pelias prepare all command, I get this:

Creating extract at /data/placeholder/wof.extract
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/data/openstreetmap/italy-latest.osm.pbf is very large.
We strongly recommend using Valhalla to produce extracts for large PBF extracts.
You can also download pre-processed polyline extracts from Geocode Earth.
see: https://github.com/pelias/polylines#download-data
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Generating polylines from /data/openstreetmap/italy-latest.osm.pbf failed! The file is too large.
Exiting...
import...
populate fts...
optimize...
close...
Done!
- importing polylines
- archiving street database
- conflating openaddresses
Tue Feb 20 11:29:09 AM UTC 2024 /data/openaddresses/it/countrywide.csv

(this is for another country Italy, I was trying with multiple countries) So from the logs it looks like I should be generating polylines extract.0sv file independently instead of making the pelias executable do it. Also, it would be great if some could tell me exactly what happens if I place the created extract.0sv file in the data/polylines folder as per the pelias.json entry:

 "polyline": {
      "datapath": "/data/polylines",
      "files": [ "extract.0sv" ]
    },

Will it spot the file and use it instead of trying to create one?

answerquest commented 7 months ago

sharing for those who're looking for a workaround:

With the generated extract.0sv:

grep -av "^2024" extract.0sv > clean_extract.0sv

Please change the 2024 to later years if you're reading this in the future. Since all the log lines are starting with a datestamp, grepping for lines that don't (-v is inversion) start with it the year. Increase it yyyy-mm or yyyy-mm-dd if you're paranoid. And it's a binary file (even though it has the data in lines), so have to use -a else grep won't play.

lvalnegri commented 6 months ago

sharing for those who're looking for a workaround:

With the generated extract.0sv:

grep -av "^2024" extract.0sv > clean_extract.0sv

Please change the 2024 to later years if you're reading this in the future. Since all the log lines are starting with a datestamp, grepping for lines that don't (-v is inversion) start with it the year. Increase it yyyy-mm or yyyy-mm-dd if you're paranoid. And it's a binary file (even though it has the data in lines), so have to use -a else grep won't play.

I actually run from the country root folder:

grep -av "^YYYY/MM/DD" ./data/openstreetmap/extract.0sv > ./data/polylines/extract.0sv

then it imports it like a charm.

Many thanks for opening this very detailed and informative issue and subsequent comments, you've saved me quite a lot of time!