pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 43 forks source link

Encoding error during import #249

Closed vincbon closed 7 years ago

vincbon commented 7 years ago

An encoding error occurs right when I start the import. I have tried downloading the data again (with the script), same thing. Maybe it's the connection : data is downloaded at about 300kb/s and the final size is 8Gb (hierarchy only) as opposed to the "tens of GB" descriped here. I'm on Mac too.

Here's the output :

pelias-whosonfirst@0.0.0-semantic-release start /Users/vincent.bonhomme/Documents/pelias/whosonfirst node import.js

2017-07-10T08:00:49.140Z - info: [whosonfirst] Loading wof-continent-latest.csv records from /Users/vincent.bonhomme/Downloads/data/whosonfirst/meta buffer.js:557 throw new TypeError('Unknown encoding: ' + encoding); ^

TypeError: Unknown encoding: at stringSlice (buffer.js:557:9) at Buffer.toString (buffer.js:593:10) at CSVStream.write (/Users/vincent.bonhomme/Documents/pelias/whosonfirst/node_modules/csv-stream/index.js:59:34) at ReadStream.ondata (_stream_readable.js:628:20) at emitOne (events.js:115:13) at ReadStream.emit (events.js:210:7) at addChunk (_stream_readable.js:252:12) at readableAddChunk (_stream_readable.js:239:11) at ReadStream.Readable.push (_stream_readable.js:197:10) at onread (fs.js:2004:12)

npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! pelias-whosonfirst@0.0.0-semantic-release start: node import.js npm ERR! Exit status 1

missinglink commented 7 years ago

hi @vincbon, that's unusual.

it's possible that the data has been corrupted during download, but you said you tried twice, so that's fairly unlikely.

I'm assuming you're not getting any errors printed to the console and your HDD is not full :)

I have a copy I downloaded using that script last week:

$ du -sh whosonfirst-data
9.5G    whosonfirst-data
$ head whosonfirst-data/meta/wof-continent-latest.csv 
bbox,cessation,country_id,deprecated,file_hash,fullname,geom_hash,geom_latitude,geom_longitude,id,inception,iso,iso_country,lastmodified,lbl_latitude,lbl_longitude,locality_id,name,parent_id,path,placetype,region_id,source,superseded_by,supersedes,wof_country
"-25.360422,-34.821954,51.417038,37.345201",uuuu,0,,20f182732c9bd83e3e5f4663066b8634,,dff78d8fecedbc6608382c5488c043e4,6.435268,18.269761,102191573,uuuu,,,1487379600,21.638471,3.924068,0,Africa,-1,102/191/573/102191573.geojson,continent,0,mz,,,
"-179.143503,5.515082,179.780935,83.634101",uuuu,0,,458689b94527634d9d0d4ec3d63f7aa5,,9bacf2657d678c2c30c9da2b68c6eaf3,56.498698,-92.335587,102191575,uuuu,,,1487379583,38.309137,-101.328273,0,North America,-1,102/191/575/102191575.geojson,continent,0,mz,,,
"-92.011586,-55.918504,-28.877065,15.702948",uuuu,0,,f5a041799c58148b2e3f8e742e7d4547,,71e0910557e86eb7c2cd223e617d2fdc,-15.160376,-60.790535,102191577,uuuu,,,1487379577,-12.819233,-50.657067,0,South America,-1,102/191/577/102191577.geojson,continent,0,mz,,,
"-180,-90,180,-54.380141",uuuu,0,,a2b590b24d5afa2e5ee18320ed589ebb,,04025d1d992fe82595c94c34ad7431f5,-80.508459,19.909273,102191579,uuuu,,,1471031114,-79.843222,35.885456,0,Antarctica,-1,102/191/579/102191579.geojson,continent,0,naturalearth,,,
"-24.539906,34.815009,69.033946,81.85871",uuuu,0,,952f39e202036998392efccfdca0043e,,00b0e73398b201525a4f0468eb3deb91,55.924452,28.098120,102191581,uuuu,,,1487379590,55.859135,39.579995,0,Europe,-1,102/191/581/102191581.geojson,continent,0,mz,,,
"-180,-54.750421,180,28.401679",uuuu,0,,44df6f6a99ed2123545d8ecd65baab6b,,bda22634994d5429cd01986d10d74d0e,-25.194861,135.850421,102191583,uuuu,,,1480370012,-24.130111,134.050312,0,Oceania,-1,102/191/583/102191583.geojson,continent,0,naturalearth,,,
"-109.234242,-60.772719,94.290782,39.728339",uuuu,0,,f3e3802909e3b56cd67d94f9667631f9,,751f818b7ef554ea6388d199a84247aa,-25.693528,38.764014,102193527,uuuu,,,1480369942,-54.385042,-36.591464,0,Seven Seas,-1,102/193/527/102193527.geojson,continent,0,naturalearth,,,
"-180,-12.199965,180,81.288804",uuuu,0,,0303bbefc07bb0fa96c41acfd6273444,,36c59fe9681e427a0130d8390c3d5a82,45.292720,95.729651,102191569,uuuu,,,1487379596,49.512481,94.464337,0,Asia,-1,102/191/569/102191569.geojson,continent,0,mz,,,
$ head whosonfirst-data/data/102/191/573/102191573.geojson
{
  "id": 102191573,
  "type": "Feature",
  "properties": {
    "edtf:cessation":"uuuu",
    "edtf:inception":"uuuu",
    "geom:area":2558.608491,
    "geom:area_square_m":29997481964416.835938,
    "geom:bbox":"-25.360422,-34.821954,51.417038,37.345201",
    "geom:latitude":6.435268,

do you get the same/very similar results for these commands on your machine?

vincbon commented 7 years ago

Hi @missinglink and thanks for the fast reply.

Indeed, HDD is not full and inodes are ok:

$ df -ih
Filesystem      Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1     930Gi   79Gi  851Gi     9% 2103540 4292863739    0%   /

I get the same results for the heads and the disk usage command prints 7.6G.

I should mention that I modified the download_data.js file. Specifically, at line 46, I set a maxBuffer greater than the default one. Without it, an stdout maxBuffer exceeded error occurs for some of the bundles during download.

From:

child_process.exec(cmd, function commandCallback(error, stdout, stderr) {

To:

child_process.exec(cmd, {maxBuffer: 1024 * 1000 * 10}, function commandCallback(error, stdout, stderr) {

Just in case, Node version is 8.1.2

missinglink commented 7 years ago

I'm running v4.8.3 locally. I believe we only officially support up to 6, could you try downgrading?

missinglink commented 7 years ago

There is mention in the docs of some changes to the Buffer API after 8, you could try --zero-fill-buffers or something similar to see if it works.

missinglink commented 7 years ago

FYI: you can use this shell script to change versions.

vincbon commented 7 years ago

Problem solved! Downgrading to Node 4 fixed it, without re-downloading any data.

Maybe this part

Requirements Node.js 4 or higher is required

is a bit misleading :)

Thanks for nave, I'll be using it now.

missinglink commented 7 years ago

Cool, glad that worked for you, we're always working to try to be up-to-date but when they make breaking API changes it makes life more difficult :)

orangejulius commented 6 years ago

Hello! Just for the record, we've fixed this issue in https://github.com/pelias/whosonfirst/pull/304. Anyone running Node.js 8 should feel encouraged to try it out and let us know how it goes.