omniscale / imposm3

Imposm imports OpenStreetMap data into PostGIS
http://imposm.org/docs/imposm3/latest/
Apache License 2.0
719 stars 157 forks source link

Error while updating planet import #122

Closed svancise closed 7 years ago

svancise commented 7 years ago

Hi, I've had a cron job running everyday updating a planet osm import and it has been working great. However, I checked my logs this morning and found that the update has crashed for the past two days (October 5th and 6th). Error below. I've tried to re run a few times, but it seems to get the same error while parsing the diff. Seems to be something while parsing a feature's tags.

OS is Ubuntu 14.04 Build is 0.2.0dev-20160615-d495ca4

Thanks!

[Oct  6 17:17:33] Processing /home/ubuntu/osm_data/changes.osc.gz
[Oct  6 17:17:33] Parsing changes, updating cache and removing elements
[Oct  6 17:18:33] [INFO] [  1m0s] C:   13000/s (815821) N:     400/s (29836) W:       0/s (0) R:      0/s (0)
[Oct  6 17:19:33] [INFO] [  2m0s] C:   22000/s (2667879) N:     600/s (74916) W:       0/s (0) R:      0/s (0)
[Oct  6 17:19:47] [INFO] [diff] Processing /home/ubuntu/osm_data/changes.osc.gz took: 2m13.958160413s
panic: runtime error: index out of range

goroutine 1 [running]:
panic(0x990960, 0xc82000e0b0)
        /root/imposm/go/src/runtime/panic.go:464 +0x3e6
github.com/omniscale/imposm3/cache/binary.tagsFromArray(0xc82249b420, 0x2, 0x2, 0x7fe452fcfc40)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/binary/tags.go:85 +0x43e
github.com/omniscale/imposm3/cache/binary.UnmarshalNode(0xc82244bbc0, 0x21, 0x21, 0xc82249b440, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/binary/serialize.go:43 +0x1d4
github.com/omniscale/imposm3/cache.(*NodesCache).GetNode(0xc82011ab70, 0x1081266ff, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/nodes.go:70 +0x291
github.com/omniscale/imposm3/diff.(*Deleter).deleteNode(0xc8211bf960, 0x1081266ff, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:188 +0x57
github.com/omniscale/imposm3/diff.(*Deleter).Delete(0xc8211bf960, 0x1, 0xc82249b0a0, 0x0, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:235 +0x2af
github.com/omniscale/imposm3/diff.Update(0x7fff361c4930, 0x24, 0x0, 0x0, 0x0, 0xc820019000, 0xc82011ac30, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/process.go:209 +0x3434
github.com/omniscale/imposm3/diff.Diff()
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/process.go:60 +0x835
github.com/omniscale/imposm3/cmd.Main(0xb4bea8)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cmd/main.go:53 +0x1f1
main.main()
        /root/imposm/gopath/src/github.com/omniscale/imposm3/imposm3.go:8 +0x23
ImreSamu commented 7 years ago

I've tried to re run a few times, but it seems to get the same error while parsing the diff

Please re-check the input file(s) (changes.osc.gz). It is valid xml or damaged? For example with the osmium tool

What is the output of the osmium fileinfo ?

osmium fileinfo changes.osc.gz  -e
olt commented 7 years ago

The error happened when Imposm was reading the internal cache. So it's not related to the changes file. Unfortunately, this means you need to make a new import.

You compiled Imposm on your own, did you make any change in tags.go (addCommonKey/addTagCodePoint)?

If this is not the case, then I'd like to take a closer look at your nodes cache. Can you upload the whole nodes directory of you cache somewhere? This should be around 2-3GB, I don't need the other directories. You can mail me a link at olt@omniscale.de

svancise commented 7 years ago

Thanks for taking a look. I've sent you an email with a link to the nodes cache. I did not compile Imposm myself, I used a static build.

Thanks

olt commented 7 years ago

Can you share your mapping? Do you use __any__=__any__?

svancise commented 7 years ago

Here are my mappings, no __any__=__any__.

{
    "tags": {
        "load_all": true,
        "exclude": [
            "created_by",
            "source"
        ]
    },
    "use_single_id_space": true,
    "tables": {
        "osm_planet": {
            "columns": [{
                "type": "id",
                "name": "osm_id",
                "key": null
            }, {
                "type": "geometry",
                "name": "geom",
                "key": null
            }, {
                "type": "hstore_tags",
                "name": "tags",
                "key": null
            }],
            "type": "geometry",
            "type_mappings": {
                "points": {
                    "amenity": ["__any__"],
                    "denomination": ["__any__"],
                    "leisure": ["__any__"],
                    "military": ["__any__"],
                    "name": ["__any__"],
                    "poi": ["__any__"],
                    "religion": ["__any__"],
                    "shop": ["__any__"],
                    "sport": ["__any__"],
                    "tourism": ["__any__"]
                },
                "linestrings": {
                    "amenity": ["__any__"],
                    "denomination": ["__any__"],
                    "leisure": ["__any__"],
                    "military": ["__any__"],
                    "name": ["__any__"],
                    "poi": ["__any__"],
                    "religion": ["__any__"],
                    "shop": ["__any__"],
                    "sport": ["__any__"],
                    "tourism": ["__any__"]
                },
                "polygons": {
                    "amenity": ["__any__"],
                    "denomination": ["__any__"],
                    "leisure": ["__any__"],
                    "military": ["__any__"],
                    "name": ["__any__"],
                    "poi": ["__any__"],
                    "religion": ["__any__"],
                    "shop": ["__any__"],
                    "sport": ["__any__"],
                    "tourism": ["__any__"]
                }
            }
        }
    }
}
bdon commented 7 years ago

Hi,

I can reproduce this error exactly on two different planet databases I have updating minutely/hourly. Both are using my own fork of imposm3 but it doesn't have any changes other than a small modification of the multipolygon routine, and since the error is the same as svancise's I'm inferring that it's not related to my changes.

DB 1 is branched from imposm3 build 0.1dev, here is the error:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/omniscale/imposm3/cache/binary.tagsFromArray(0xc8216fe8a0, 0x2, 0x2, 0x7fd05429c920)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/binary/tags.go:85 +0x44a
github.com/omniscale/imposm3/cache/binary.UnmarshalNode(0xc82171ea80, 0x21, 0x21, 0xc8216fe8c0, 0x0, 0x0)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/binary/serialize.go:43 +0x1d4
github.com/omniscale/imposm3/cache.(*NodesCache).GetNode(0xc8200ea870, 0x1081266ff, 0x0, 0x0, 0x0)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/nodes.go:70 +0x291
github.com/omniscale/imposm3/diff.(*Deleter).deleteNode(0xc8200af980, 0x1081266ff, 0x0, 0x0)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:194 +0x57
github.com/omniscale/imposm3/diff.(*Deleter).Delete(0xc8200af980, 0x1, 0xc8216fe620, 0x0, 0x0, 0x0, 0x0)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:241 +0x2ac
github.com/omniscale/imposm3/diff.Update(0x7ffd4cad596b, 0x2a, 0x0, 0x0, 0x0, 0xc820019040, 0xc8200ea9f0, 0x415000, 0x0, 0x0)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/process.go:207 +0x31c6
github.com/omniscale/imposm3/diff.Diff()
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/process.go:60 +0x838
github.com/omniscale/imposm3/cmd.Main(0xaa6680)
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cmd/main.go:55 +0x1eb
main.main()
    /home/ubuntu/gopath/src/github.com/omniscale/imposm3/imposm3.go:8 +0x23

The osmosis sequence number for the diff file that caused the error was 2125792 so it is in one of the changes within an hour of this.

Link to the OSM diff file: https://s3-us-west-2.amazonaws.com/openmassing/misc/2125792.osc.gz

I started a new import on DB 2 after seeing the error on DB 1, it is branched from imposm3 build 0.2.0dev (current master as of Oct 5, 2016). DB 1 had a corrupted LevelDB that I repaired, so I suspected that data loss from repairing might have caused the error. Here is the error:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/omniscale/imposm3/cache/binary.tagsFromArray(0xc8213fc320, 0x2, 0x2, 0x7f62baaeda48)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/binary/tags.go:85 +0x44a
github.com/omniscale/imposm3/cache/binary.UnmarshalNode(0xc8218173e0, 0x21, 0x21, 0xc8213fc340, 0x0, 0x0)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/binary/serialize.go:43 +0x1d4
github.com/omniscale/imposm3/cache.(*NodesCache).GetNode(0xc8201301b0, 0x1081266ff, 0x0, 0x0, 0x0)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cache/nodes.go:70 +0x291
github.com/omniscale/imposm3/diff.(*Deleter).deleteNode(0xc8200af980, 0x1081266ff, 0x0, 0x0)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:194 +0x57
github.com/omniscale/imposm3/diff.(*Deleter).Delete(0xc8200af980, 0x1, 0xc8213fc140, 0x0, 0x0, 0x0, 0x0)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/deleter.go:241 +0x2ac
github.com/omniscale/imposm3/diff.Update(0x7fff6872a878, 0x2a, 0x0, 0x0, 0x0, 0xc8200981c0, 0xc820130330, 0x415000, 0x0, 0x0)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/process.go:207 +0x31c6
github.com/omniscale/imposm3/diff.Diff()
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/diff/process.go:60 +0x838
github.com/omniscale/imposm3/cmd.Main(0xaa6680)
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/cmd/main.go:55 +0x1eb
main.main()
        /home/ubuntu/gopath/src/github.com/omniscale/imposm3/imposm3.go:8 +0x23

the sequence number for the diff file that caused the error is 2125774.

Link to the OSM diff file: https://s3-us-west-2.amazonaws.com/openmassing/misc/2125774.osc.gz

I haven't really dug into this bug yet, if you have any leads on the root cause I would be happy to attempt making a patch.

ImreSamu commented 7 years ago

in the 2125774.osc.gz file ( thanks @bdon ) I have found a strange Key
<tag k="&#10;bulding" v="warehouse"/>

probably the first char ( &#10 ) detected as a "code points for common keys" ;

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.5.4 (26385 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
 <node id="4430391039" visible="true" version="1" changeset="42622374" timestamp="2016-10-04T08:13:53Z" user="ROGERRE" uid="4569166" lat="13.9160490" lon="121.8154030">
  <tag k="&#10;bulding" v="warehouse"/>
 </node>
</osm>

or in OPL osmium cat 2125774.osc.gz -f opl | grep 4430391039

n4430391039 v1 dV c42622374 t2016-10-04T08:13:53Z i4569166 uROGERRE T%0a%bulding=warehouse x121.8154030 y13.9160490
ImreSamu commented 7 years ago

my quick fix / workaround : dropping/excluding all keys - starting with the 31 "code points for common keys" ; ( "\u0001*" ... "\u001F*" ) // this is solved the error message, but need more test !!

{
    "tags": {
        "load_all": true,
        "exclude": [
            "created_by",
            "source",
            "\u0001*", 
            "\u0002*",
            "\u0003*",
            "\u0004*",
            "\u0005*",
            "\u0006*",
            "\u0007*", 
            "\u0008*",
            "\u0009*",                                                                                                      
            "\u000A*",
            "\u000B*", 
            "\u000C*",
            "\u000D*",
            "\u000E*",
            "\u000F*",
            "\u0010*",
            "\u0011*", 
            "\u0012*",
            "\u0013*",                                                                                                      
            "\u0014*",
            "\u0015*",
            "\u0016*",
            "\u0017*", 
            "\u0018*",
            "\u0019*",                                                                                                      
            "\u001A*",
            "\u001B*", 
            "\u001C*",
            "\u001D*",
            "\u001E*",
            "\u001F*"      
            ]
    },
    "use_single_id_space": true,

...
svancise commented 7 years ago

Great find @ImreSamu, the change sets I've been generating have the same key. Is a re-import still necessary after making those changes to the mappings? Is it possible to update with a different mappings than that which you imported with (only thing changed was the exclude array)? I tested the updated with the modified mappings and still ran into the same error so I just wanted to be clear. Thanks!

bdon commented 7 years ago

@svancise @ImreSamu same here - calling imposm3 update with the new mapping doesn't fix the problem. I suspect we may need a small script to delete or modify the erroneous node's entry in LevelDB?

ImreSamu commented 7 years ago

@svancise @bdon yes, my workaround is not fixing the incorrect imposm3 cache file [1] ,
and according to my best knowledge

fixing the cache:

[1] ./imposm3 query-cache -cachedir /tmp/imposm3cache -node 4430391039

root@53350ac54421:/go/src/github.com/omniscale/imposm3# ./imposm3 query-cache -cachedir /tmp/imposm3cache  -node 4430391039
panic: runtime error: index out of range

goroutine 1 [running]:
panic(0x84cee0, 0xc420012150)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/omniscale/imposm3/cache/binary.tagsFromArray(0xc4200861c0, 0x2, 0x2, 0xcd4340)
    /go/src/github.com/omniscale/imposm3/cache/binary/tags.go:85 +0x1ae
github.com/omniscale/imposm3/cache/binary.UnmarshalNode(0xc42007ad50, 0x21, 0x21, 0x8, 0x8, 0xc42007ad50)
    /go/src/github.com/omniscale/imposm3/cache/binary/serialize.go:43 +0x156
github.com/omniscale/imposm3/cache.(*NodesCache).GetNode(0xc42012ca50, 0x1081266ff, 0xc420724080, 0xa, 0xc42007ad20)
    /go/src/github.com/omniscale/imposm3/cache/nodes.go:72 +0x11a
github.com/omniscale/imposm3/cache/query.collectNodes(0xc420037e80, 0xc420037e50, 0xc420724070, 0x1, 0x1, 0x0, 0x906434)
    /go/src/github.com/omniscale/imposm3/cache/query/query.go:108 +0xec
github.com/omniscale/imposm3/cache/query.Query(0xc420072080, 0x4, 0x4)
    /go/src/github.com/omniscale/imposm3/cache/query/query.go:203 +0x2c9
github.com/omniscale/imposm3/cmd.Main(0x9087e0)
    /go/src/github.com/omniscale/imposm3/cmd/main.go:56 +0x395
main.main()
    /go/src/github.com/omniscale/imposm3/imposm3.go:8 +0x2d
ImreSamu commented 7 years ago

I have created a Proof of Concept fix see #123

olt commented 7 years ago

Imposm reduces the cache size by encoding common keys and tag combinations (like highway=residential) with "ASCII" control characters and private Unicode symbols. There was no escaping of these symbols as the user defined which tags are cached and other tags were filtered out anyway. This condition does not apply anymore with the load_all feature and it now crashed when a user added a backspace character in the key (Thanks @ImreSamu for finding this one).

All encoding symbols are now properly escaped.

All users who use load_all with diff imports should upgrade to 4a472f5 from 2016-10-10 or higher and do a full re-import.

There is a new binary at https://imposm.org/static/rel/

Thanks for reporting this issue and sorry for the trouble it might have caused you.

svancise commented 7 years ago

Great! Thanks for the help and the quick resolution @olt @ImreSamu

svancise commented 7 years ago

@olt thanks again for the quick response. I downloaded the new binary, got everything set up and ran a re-import. When I ran the update following that import, I received what seems to be a similar error:

panic: runtime error: index out of range

goroutine 1 [running]:
panic(0x990cc0, 0xc82000e0b0)
        /root/imposm/go/src/runtime/panic.go:464 +0x3e6
github.com/omniscale/imposm3/cache/binary.appendTag(0xc8275fa900, 0x0, 0x4, 0x0, 0x0, 0xc8213a4f18, 0x4, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/binary/tags.go:129 +0x5a8
github.com/omniscale/imposm3/cache/binary.tagsAsArray(0xc829495860, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/binary/tags.go:114 +0x1b1
github.com/omniscale/imposm3/cache/binary.MarshalNode(0xc829813a20, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/binary/serialize.go:30 +0x184
github.com/omniscale/imposm3/cache.(*NodesCache).PutNode(0xc82011ab70, 0xc829813a20, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cache/nodes.go:31 +0x211
github.com/omniscale/imposm3/diff.Update(0x7ffe2d2a0931, 0x24, 0x0, 0x0, 0x0, 0xc820019000, 0xc82011ac30, 0x0, 0x0, 0x0)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/process.go:278 +0x3ec1
github.com/omniscale/imposm3/diff.Diff()
        /root/imposm/gopath/src/github.com/omniscale/imposm3/diff/process.go:60 +0x835
github.com/omniscale/imposm3/cmd.Main(0xb4c2a8)
        /root/imposm/gopath/src/github.com/omniscale/imposm3/cmd/main.go:53 +0x1f1
main.main()
        /root/imposm/gopath/src/github.com/omniscale/imposm3/imposm3.go:8 +0x23

Here is the change set that was generated: https://s3.amazonaws.com/osm-changes/changes.osc.gz

The starting sequence number was: 1482.

I used the same mappings I posted above. Is there anything additional I need to do to get this to work?

ImreSamu commented 7 years ago

@svancise thanks, replicated ..

probably - we have some zero length keys <tag k="" v="Panichi"/>

$ osmium cat changes.osc.gz -f opl | grep ' T='
n4429280790 v1 dV c42612165 t2016-10-03T18:29:38Z i4061992 usabulous T=tree,natural=tree x47.0666971 y34.3585209
n4430966919 v1 dV c42634088 t2016-10-04T16:03:15Z i461682 uGriphon T=Panichi,shop=jewelry x12.1383492 y43.5723971
w446121192 v1 dV c42686649 t2016-10-06T15:00:51Z i2298977 uPeZa T=h,barrier=hedge Nn4434454690,n4434414895

see version="1" : http://www.openstreetmap.org/node/4429280790/history see version="1" : http://www.openstreetmap.org/node/4430966919/history see version="1" : http://www.openstreetmap.org/way/446121192/history

<node id="4430966919" changeset="42634088" timestamp="2016-10-04T16:03:15Z" version="1" visible="true" user="Griphon" uid="461682" lat="43.5723971" lon="12.1383492">
<tag k="" v="Panichi"/>
<tag k="shop" v="jewelry"/>
</node>

and we need to add - an extra check ... https://github.com/omniscale/imposm3/blob/master/cache/binary/tags.go#L129 from:

     if key[0] < 32 {

proposed:

     if len(key) > 0 && key[0] < 32 {
ImreSamu commented 7 years ago

alternative solution - extending the imposm3 mapping file - with an empty exclude tags string

{
    "tags": {
        "load_all": true,
        "exclude": [
            "created_by",
            "source",
            ""
            ]
    },
...
olt commented 7 years ago

Uh. I've fixed this and extended the tests. Thanks!

svancise commented 7 years ago

Great, thanks for the quick response!