pelias / placeholder

stand-alone coarse geocoder
https://placeholder.demo.geocode.earth
MIT License
314 stars 35 forks source link

Sqlite Error: malformed JSON #183

Closed getorca closed 4 years ago

getorca commented 4 years ago

Describe the bug When using pelias prepare placeholder with 'whosonfirst-data-admin-ca-latest.db', 'whosonfirst-data-admin-us-latest.db', 'whosonfirst-data-postalcode-ca-latest.db', and 'whosonfirst-data-postalcode-us-latest.db', in '/data/whosonfirst/sqlite', download on may 4th, I get the following errors:

Creating extract at /data/placeholder/wof.extract
import...
SqliteError: malformed JSON
    at Statement.run (<anonymous>)
    at DocStore.set (/code/pelias/placeholder/lib/DocStore.js:67:10)
    at Placeholder.insertWofRecord (/code/pelias/placeholder/prototype/wof.js:210:14)
    at DestroyableTransform.insert [as _transform] (/code/pelias/placeholder/cmd/load.js:15:19)
    at DestroyableTransform.Transform._read (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:184:1>
    at DestroyableTransform.Transform._write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:172:>
    at doWrite (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:428:64)
    at writeOrBuffer (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:417:5)
    at DestroyableTransform.Writable.write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:334:11)
    at DestroyableTransform.ondata (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_readable.js:619:20)
INSERT INTO docs (id, json) VALUES ($id, $json)
SqliteError: malformed JSON
    at Statement.run (<anonymous>)
    at DocStore.set (/code/pelias/placeholder/lib/DocStore.js:67:10)
    at Placeholder.insertWofRecord (/code/pelias/placeholder/prototype/wof.js:210:14)
    at DestroyableTransform.insert [as _transform] (/code/pelias/placeholder/cmd/load.js:15:19)
    at DestroyableTransform.Transform._read (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:184:1>
    at DestroyableTransform.Transform._write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:172:>
    at doWrite (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:428:64)
    at writeOrBuffer (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:417:5)
    at DestroyableTransform.Writable.write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:334:11)
    at DestroyableTransform.ondata (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_readable.js:619:20)
populate fts...
optimize...
close...

Steps to Reproduce

Expected behavior pelias prepare placeholder extracts wof files and imports into sqlite3.

Environment (please complete the following information):

Additional context Download both US & Canada wof admin & postals

    "whosonfirst": {
      "datapath": "/data/whosonfirst",
      "importPostalcodes": true,
      "countryCode": ["CA", "US"],
      "importPlace": ["85633041", "85633793"]
    }

full pelias config available @ https://github.com/getorca/docker/blob/master/projects/canada-usa/pelias.json

References

missinglink commented 4 years ago

Ugh, it looks like this is also affected by the 'Canada issue', I'll have a look at fixing it tomorrow.

missinglink commented 4 years ago

It also seems to affect our download hosted on data.geocode.earth.

I've asked the :robot: to rebuild the file and hope to have that uploaded tonight, pending testing.

getorca commented 4 years ago

Still getting the same error with the previously mentioned pelias.json config. It seems to be be with the US place ID, '85633793' in imports.whosonfirst.importPlace. I can't confirm for any other places IDs that may or may not work.

The following, with Canada's place ID, works as expected:

    "whosonfirst": {
      "datapath": "/data/whosonfirst",
      "importPostalcodes": true,
      "countryCode": ["CA", "US"],
      "importPlace": ["85633041"]
    }

while the following, with the US place ID, does not:

    "whosonfirst": {
      "datapath": "/data/whosonfirst",
      "importPostalcodes": true,
      "countryCode": ["CA", "US"],
      "importPlace": ["85633793"]
    }
missinglink commented 4 years ago

I haven't had time to look at this yet, two bugs have been fixed this week related to the SQLite schema, so it may be worth re-downloading the data and trying again.

🤞

getorca commented 4 years ago

sadly nope, same issue.

missinglink commented 4 years ago

Ok, can you please try one last time pelias compose pull and with the latest data? The previous docker image didn't include https://github.com/pelias/whosonfirst/pull/487/commits/f8b6479254a973116cc128bf902200706f6c2d66 which I suspect was the root cause.

Sorry, I know it's annoying but I have a thousand things going on right now and can't afford the time to rebuild USA on my laptop right now.

getorca commented 4 years ago

not working, same error.

Anything else I can do to help to debug or build?

missinglink commented 4 years ago

This has been reported twice now, by @getorca and @divE4pip

I had another quick look at the code and it seems that the malformed JSON error should only be generated in one situation, when the json_extract function is being called on an invalid JSON value:

example:

sqlite3 :memory: "SELECT json_extract( 'not valid json', '$.geom.bbox' );"
Error: malformed JSON

however if the JSON is simply empty, or doesn't contain the property we are targeting ('geom.bbox') then it doesn't emit an error:

sqlite3 :memory: "SELECT json_extract( '{}', '$.geom.bbox' );"

there is a little complexity here as the json_extract function is executed in an SQL trigger but I'm confused how it's possible to insert invalid JSON from this line https://github.com/pelias/placeholder/blob/master/lib/DocStore.js#L67 which is using JSON.stringify

We host pre-generated copies of the data here https://geocode.earth/data.

I suspect that there may be one or more JSON objects in the source data which are invalid for some reason, so I ran this query against the database:

sqlite3 store.sqlite3 'SELECT * from docs WHERE json_valid(json) = 0;'

This returned zero rows, so I'm still not able to track down the error... could you please try that query against your database and hopefully we can find out which ID is causing it?

In the meantime, if there are only one or two of these errors then you're likely safe to continue with the build, you'll only be missing those records and it may not be a reason to discard the whole build.

Another option is to download the copies from our data portal and use those instead.

missinglink commented 4 years ago

I'm adding https://github.com/pelias/placeholder/pull/188 so that any future reports will contain enough info in the logs to solve this.

divE4pip commented 4 years ago

@missinglink tried to run your command but looks like the docs table is not present:

sqlite3 whosonfirst-data-admin-ca-latest.db 'SELECT * from docs WHERE json_valid(json) = 0;'
Error: no such table: docs
sqlite3 whosonfirst-data-admin-ca-latest.db '.tables'
ancestors     concordances  geojson       names         spr   
missinglink commented 4 years ago

@divE4pip the docs table is from the store.sqlite3 database, you should find it in your placeholder directory

divE4pip commented 4 years ago

@missinglink I see, just tried to run your command in the placeholder dir and it yields 0 results:

$ sqlite3 store.sqlite3 'SELECT * from docs WHERE json_valid(json) = 0;'
$
ls -lah 
total 1.2G
780M Jun  7 19:48 store.sqlite3
427M Jun  7 19:31 wof.extract
getorca commented 4 years ago

If I'm remembering correctly the malformed json stops the build. I'll try to recreate it shortly. My work around was too download the build from https://data.geocode.earth/placeholder/store.sqlite3.gz. However I see a couple potential issues:

I'm unsure of whether the issue is better to be handled fixing the WOF files, or the build script. As mentioned above the download from geocode earth looks good. I guess end users of Pelias are limited to the scripts, but I believe there where some changes in the WOF files recently, when the host switched?

agalewsky-pnn commented 4 years ago

I am having this problem as well. I have pruned down my json file to just use US and it still bombs out. I may try as above to just download the sqlite file and go from there....

-Andy

missinglink commented 4 years ago

@agalewsky-pnn I recently added https://github.com/pelias/placeholder/pull/188 which should provide enough info in the logs to diagnose the problem.

Can you please paste the relevant part of your logs so I can fix this?

blackmad commented 4 years ago

I'm getting this too. here's the doc + id

SqliteError: malformed JSON at Statement.run () at DocStore.set (/code/pelias/placeholder/lib/DocStore.js:67:10) at Placeholder.insertWofRecord (/code/pelias/placeholder/prototype/wof.js:210:14) at DestroyableTransform.insert [as _transform] (/code/pelias/placeholder/cmd/load.js:15:19) at DestroyableTransform.Transform._read (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:184:10) at DestroyableTransform.Transform._write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:172:83) at doWrite (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:428:64) at writeOrBuffer (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:417:5) at DestroyableTransform.Writable.write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:334:11) at DestroyableTransform.ondata (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_readable.js:619:20) INSERT INTO docs (id, json) VALUES ($id, $json) 1713109987 { id: 1713109987, name: 'Wayne', abbr: undefined, placetype: 'locality', rank: { min: 9, max: 10 }, population: 30892, popularity: undefined, lineage: [ { continent_id: 102191575, country_id: 85633793, county_id: 102080913, localadmin_id: 404486571, locality_id: 1713109987, region_id: 85688481 } ], geom: { area: undefined, bbox: '-75.3876940.044,-75.38769,40.044', lat: 40.044, lon: -75.38769 }, names: { cat: [ 'Wayne' ], ceb: [ 'Wayne' ], eus: [ 'Wayne' ], ita: [ 'Wayne' ], mlg: [ 'Wayne' ], nld: [ 'Wayne' ], por: [ 'Wayne' ], spa: [ 'Wayne' ], uzb: [ 'Wayne' ] } } SqliteError: malformed JSON at Statement.run () at DocStore.set (/code/pelias/placeholder/lib/DocStore.js:67:10) at Placeholder.insertWofRecord (/code/pelias/placeholder/prototype/wof.js:210:14) at DestroyableTransform.insert [as _transform] (/code/pelias/placeholder/cmd/load.js:15:19) at DestroyableTransform.Transform._read (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:184:10) at DestroyableTransform.Transform._write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_transform.js:172:83) at doWrite (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:428:64) at writeOrBuffer (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:417:5) at DestroyableTransform.Writable.write (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_writable.js:334:11) at DestroyableTransform.ondata (/code/pelias/placeholder/node_modules/readable-stream/lib/_stream_readable.js:619:20) populate fts... optimize... close... Done!

missinglink commented 4 years ago

resolved in https://github.com/pelias/placeholder/pull/191

stepps00 commented 4 years ago

Thanks for the issue and upstream fix!