joto / osmium

C++/Javascript framework for working with OSM files.
http://wiki.openstreetmap.org/wiki/Osmium
GNU General Public License v3.0
123 stars 31 forks source link

Segmentation fault / incorrect operation when using SparseTable storage #70

Closed alex85k closed 11 years ago

alex85k commented 11 years ago

I found this bug wlile adopting the code to MinGW but it also exists on Ubuntu 12.04.

All the sample programs (and tagstats from taginfo) give memory access error on line

        m_items.resize(id + m_grow_size)

in sparse_table.hpp after many successful calls to this line, checked by adding

  std::cout<<id + m_grow_size<<std::endl;

. I thought it was "low memory" error, but it disappears after inserting

m_items.resize(4000000000ULL);

into SparseTable constructor. I checked at least on osmium_toshape example. It may be sparsehash bug or some incorrect usage, I do not know exactly.

joto commented 11 years ago

Please describe exactly what you were doing, ie what program(s) in what versions with what command line and what data.

alex85k commented 11 years ago

Standard test:

    wget http://data.gis-lab.info/osm_dump/dump/latest/RU-TY.osm.pbf 
    examples/osmium_toshape RU-TY.osm.pbf 

=> Memory error. The bug is system-dependent. My Windows build is experimental, so I'll test on Ubuntu one more.

alex85k commented 11 years ago

I retested on virtual machine with Ubuntu 12.04 and 512 Mb RAM:

apt-get install libboost-dev zlib1g-dev libshp-dev libsqlite3-dev libgd2-xpm-dev libexpat1-dev  libgdal1-dev libgeos++-dev
apt-get install libsparsehash-dev  libv8-dev libicu-dev libprotobuf-dev protobuf-compiler libboost-test-dev libosmpbf-dev
git clone https://github.com/joto/osmium.git
cd osmium/examples
make
wget http://data.gis-lab.info/osm_dump/dump/latest/RU-TY.osm.pbf
./osmium_toshape RU-TY.osm.pbf

=> Segmentation fault (on the line mentioned above), when calling resize(867612189) PBF file is 1 Mb.

Out of memory error? Resize in constructor does not help on this system. But when I tested on Windows 7 64 with 4Gb RAM, 32bit build (<3Gb used?), m_items.resize(4000000000ULL) in constructor helped. Maybe resizing in the process of calculation needs even more memory?

(mingw-compatibility changes and CMake scripts are pushed to my fork of osmium, but not checked on Unix systems, MSVC compatibility will be also added. Dirtily added sparse_map.hpp and simple_map.hpp storage classes work without errors but they may be not suitable for big planet files, tested on 135Mb pbf only).

joto commented 11 years ago

These things will most likely not work on 32bit systems, even if you don't need that much memory, because it might still try to address more.

And if you don't have enough RAM this can't work. You can't use 850 MB RAM if you only have 512 MB. You will at least need some swap space, but then it will become unbearably slow. Also the resize() might (depending on the exact implementation inside the library) need up to twice the memory you think because it might reserve new memory, copy the data over, and then release the old memory.

Your command above gave a different error when I tested it: terminate called after throwing an instance of 'std::runtime_error' what(): node coordinates not available for building way geometry The reason is probably that the RU-TY.osm.pbf file is not built properly, it doesn't contain all the nodes it needs for the ways it has. There is an option in osmosis to do this properly, complete-ways or something like it.

alex85k commented 11 years ago

I have tested the same script for home Ubuntu 12.04 x86_64 with 3 Gb RAM - there are no memory errors.

alex85k commented 11 years ago

Thank you for explaining 'std::runtime_error' - I was ready to create another issue :)

It seems you are also right about twice-memory for resizing - that was the case on Windows.

By the way, why we are allocating too few items for sparsetable from the start? if its size depends on maximal ID which is about 2^32 in any freshly edited area of the world?

Maybe it is better to allocate 2^32 bits in SparseTable constructor and send expicit "Out of memory" error from it to avoid confusion?

joto commented 11 years ago

We don't know what the maximum id will be and we don't want to hardcode this. Detecting out of memory conditions is not possible in the general case, because operating systems tend to overcommit memory anyway.

I'll close this issue now as there seems to be no bug, just not enough memory.

alex85k commented 11 years ago

I agree the bug is to be closed, memory overflows can not be catched. But slow growing causes extra CPU work and 2x memory consumption, which is no good.

Please place at least 10^9 or at least provide constructor with initial capacity (SparseTable(initialsize, growsize)) to avoid any hardcoding. Any examples with small IDs are suitable only for testing, they do not exist in real world...

joto commented 11 years ago

I don't think this is an actual problem in this case. If you can provide benchmarks that show that this is a problem, we can change the code. If you actually use the whole planet or large portions of it, you don't use sparsetable anyway.

alex85k commented 11 years ago

I tried to process Ukraine (135Mb) by tagstats (taginfo) using sparsetable. Without initial capacity there was out-of memory error on 4Gb 32 bit machine, after adding initialization it was processed correctly and took only 1G of RAM.

For my tagstats on Windows I used std::map and google sparse map for this size files (rough files are in my repository), they took about 650-680 Mb on the same file (when the file is greater, sparsetable may win, then - mmaps)

Can it be called bechmark? :smile: It would be very good to have some possibility to change initial size (for testing even 1Gb may be too big) and have it almost-real-world by default.