osmcode / osmium-tool

Command line tool for working with OpenStreetMap data based on the Osmium library.
https://osmcode.org/osmium-tool/
GNU General Public License v3.0
483 stars 104 forks source link

Half as fast as osmfilter #253

Closed xeruf closed 1 year ago

xeruf commented 1 year ago

What operating system version are you using?

Arch

Tell us something about your system

Desktop, 32GB RAM, 4 CPUs a 4 GHz

What did you do exactly?

❯ time osmium tags-filter central-america-latest.o5m office -o central-america-latest-office-osmium.osm --overwrite
[======================================================================] 100% 
osmium tags-filter central-america-latest.o5m office -o  --overwrite  12.75s user 2.03s system 136% cpu 10.841 total
❯ time osmfilter central-america-latest.o5m --keep="office" >central-america-latest-office.osm
osmfilter central-america-latest.o5m --keep="office" >   6.69s user 0.89s system 99% cpu 7.649 total
❯ osmium --version
osmium version 1.14.0
libosmium version 2.18.0
Supported PBF compression types: none zlib lz4

osmfilter 1.4.4

What did you expect to happen?

osmium is close to osmfilter

What did happen instead?

On various datasets the experience consistently shows osmium tags-filter being about half as fast as osmfilter. Any explanations?

joto commented 1 year ago

With osmium it is much better to use the PBF file format. Not only do you not have to convert downloaded files to the o5m format which is an unnecessary extra step which takes time and uses a lot of disk space, reading the PBF file format is more efficient with osmium, because it can use multiple processors.

xeruf commented 1 year ago

I see, parallelization makes sense - though effectively osmium still needs 3-4x processing power:

❯ time osmium tags-filter central-america-latest.osm.pbf -o central-america-latest-office-osmium.osm --overwrite office
[======================================================================] 100% 
osmium tags-filter central-america-latest.osm.pbf -o  --overwrite office  41.26s user 3.29s system 471% cpu 9.451 total
❯ time osmium tags-filter central-america-latest.osm.pbf -o central-america-latest-office-osmium.osm --omit-referenced --overwrite nw/office
[======================================================================] 100% 
osmium tags-filter central-america-latest.osm.pbf -o  --omit-referenced    18.30s user 1.45s system 499% cpu 3.957 total
❯ time osmfilter central-america-latest.o5m --keep="office" >central-america-latest-office.osm
osmfilter central-america-latest.o5m --keep="office" >   9.62s user 1.72s system 84% cpu 13.478 total
❯ time osmfilter central-america-latest.o5m --keep=office >central-america-latest-office.osm
osmfilter central-america-latest.o5m --keep=office >   9.44s user 1.29s system 93% cpu 11.457 total

What decided it for me is --omit-referenced, as I don't want all the references anyways ;)

dieterdreist commented 1 year ago

out of curiosity, are the results the same or are the differences?

joto commented 1 year ago

@xeruf osmfilter is a very specialized tool that can do all sorts of shortcuts and is more efficient for that reason. Osmium is based on a general library which is more flexible to use, so to some degree you are comparing apples and oranges here.

xeruf commented 1 year ago

the results differ marginally, but that might also be related to the preprocessing of osm.pbf to o5m

I see your point joto, was just curious whether that was to be expected :)