navigatorsguild / osm-io

Apache License 2.0
7 stars 1 forks source link

[BUG] Writing is (very) slow #36

Open zdila opened 1 week ago

zdila commented 1 week ago

Describe the bug

I created a simple program that only reads and writes osm.pbf in parallel. It is a copy of parallel-pbf-io.rs without any filtering.

Package version

Desktop or server (please complete the following information): Linux bono 6.11.2-amd64 # 1 SMP PREEMPT_DYNAMIC Debian 6.11.2-1 (2024-10-05) x86_64 GNU/Linux

Additional context Reading only. This uses all CPU cores (I have 24):

time cargo run --release
real    0m3.509s
user    1m16.277s
sys     0m0.992s

Reading and writing. This uses ~ 5 cores.

time cargo run --release
real    3m45.996s
user    83m0.485s
sys     0m13.260s

Modified thread pool sizes - every pool size multiplied by 4. Reading and writing. This uses all cores.

time cargo run --release
real    3m45.996s
user    83m0.485s
sys     0m13.260s

BTW why are those pool sizes hardcoded?

Now osmium reading and writing. CPU utilization is low (maybe blocked by I/O speed).

time osmium cat -o out.osm.pbf in.osm.pbf
real    0m8.552s
user    0m28.301s
sys     0m1.401s

Osmium is unbeatable here. Do you see any areas of improvements? I may try to fine tune it.

zdila commented 1 week ago
cargo flamegraph -- input.osm.pbf output.osm.pbf
automatically selected target osm-patcher in package osm-patcher as it is the only valid target
warning: unused manifest key: patchcrates-io
    Finished `release` profile [optimized + debuginfo] target(s) in 0.08s
FLUSH
[ perf record: Woken up 56217 times to write data ]
Warning:
Processed 1433604 events and lost 3467 chunks!

Check IO/CPU overload!

Warning:
Processed 1626145 samples and lost 39.44%!

[ perf record: Captured and wrote 15641.616 MB perf.data (984761 samples) ]
writing flamegraph to "flamegraph.svg"

flamegraph

Gzipped interactive version: flamegraph.svg.gz