osmcode / osmium-tool

Command line tool for working with OpenStreetMap data based on the Osmium library.
https://osmcode.org/osmium-tool/
GNU General Public License v3.0
483 stars 104 forks source link

osmium extract: Too many open files #265

Closed frafra closed 1 year ago

frafra commented 1 year ago

What version of osmium-tool are you using?

$ osmium --version
osmium version 1.14.0
libosmium version 2.18.0
Supported PBF compression types: none zlib lz4

Copyright (C) 2013-2022  Jochen Topf <jochen@topf.org>
License: GNU GENERAL PUBLIC LICENSE Version 3 <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

What operating system version are you using?

$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description:    Fedora release 37 (Thirty Seven)
Release:    37
Codename:   ThirtySeven

Tell us something about your system

Bare metal, 8 core, 16 GB of RAM.

What did you do exactly?

I tried to extract ~9000 files at the same time, using a ~200 MB configuration file.

$ osmium extract --overwrite --progress --config osmium_conf.json italy-latest.osm.pbf 
Open failed for '075049_Montesano Salentino.osm.pbf': Too many open files

What did you expect to happen?

I was expecting to see osmium extracting areas.

What did happen instead?

Too many open files when creating the file number 1023.

What did you do to try analyzing the problem?

I noticed that osmium fails after creating 1022 empty files.

I tried to limit the number of extracts to 1000, and the system starts to extract, but I have to stop it since it is using too much memory. Even with 10 extracts configured, the system uses more than ~20 GB after reaching 33 % (the memory usage grows linearly with time).

joto commented 1 year ago

Please read the section MEMORY USAGE in the man page.

frafra commented 1 year ago

This is not a memory issue. It also happens when I use a rather small pbf file (~200 MB). This is a problem with the maximum number of file descriptors that a process can open.

Workaround: ulimit -n 10000

I still think that the memory usage is an anomaly, since osmium can just be used to output the features, that can be checked against the different rules for the different files, so the memory usage should be stable. It looks it is loading the input file in memory multiple times (one for each extract). It seems similar to #109.

frafra commented 1 year ago

Shouldn't osmium-tool check the limit and avoid overwriting and/or polluting the directory with empty files in this case? It could be also good to have it mentioned in the documentation as a current limitation.

frafra commented 1 year ago

Please, revert your commit. There is no hard limit set to 500: it really depends on the distribution and configuration. If you block the maximum number of opened files within the software, you are just reducing the possibility of the users to use the software as they please. It is great to have a paragraph in the documentation mentioning the issue, but it would be better to catch the specific error instead of setting a hard limit for every user.