osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.51k stars 474 forks source link

Default value for `--cache` #1501

Closed joto closed 2 months ago

joto commented 3 years ago

The default value for the --cache/-C parameter is currently 800 which is rather small for current hardware and typical use.

This would be a potentially breaking change, so we want to consider this carefully.

hug-fr commented 2 years ago

I don't want to report a new issue because my comment is only about the documentation of the --cache option which seems a bit ambiguous. The first words are "Only for slim mode: Use up to NUM MB of RAM for caching nodes." although it can be used without using the -s switch and it can really help to achieve an import process.

lonvia commented 2 years ago

@hug-fr Don't worry, we are not going to run out of issue forms. No need to economize. ;) More seriously: please always do open a new issue when you don't directly comment on the issue at hand. Even when it seems be just a tiny thing. And when you do, a link to the part of the documentation you are talking about would be helpful.

hug-fr commented 2 years ago

@lonvia Thanks for your reply. Let's go for a new and more complete reporting or a new discussion (I hope it will help and I understand the things well)

joto commented 1 year ago

A few notes on this:

We don't know how much memory a system has and, even if we find out, we don't know how much of that is needed elsewhere. So I don't think it is a good idea to set this automatically. We could maybe add some warnings though: "Your cache setting seems quite large for the amount of memory you have." or so. But we do need some guidelines the in the manual that tell users what a sensible cache value could be.

pnorman commented 1 year ago

When I have enough RAM but not so much that I don't care, my default cache setting is the size of the OSM PBF.

  • For the slim mode, the cache should not be used together with a flat-node file

I'm confident this is true for memory-constrained environments. Is it still true if you have loads of excess RAM? (e.g. 256GB+)

mboeringa commented 1 year ago
  • The --cache/-C setting has been ignored in non-slim mode for a while.

@joto, does this remark mean the cache setting is truly completely ignored when running in non-slim mode, so setting it or not will not do anything?

  • TODO: Add warning when running osm2pgsql in that case that cache setting is ignored.

Am I right this is still on your TODO list? I yesterday attempted a (successful) Planet import in non-slim mode, but I set a large --cache value based on the remark quoted below regarding non-slim mode needing to also store ways and relations, in this section of the osm2pgsql manual, but no such warning showed up with osm2pgsql version 1.8.1

In --slim mode, this is just node positions while in non-slim mode it has to store information about ways and relations too.

joto commented 2 months ago

There is a warning in place now if --cache is used in non-slim mode. The default value is okay, we can leave that. We certainly don't want to make it smaller. And if we make it bigger it uses too much memory for small input files.