pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
54 stars 44 forks source link

Error: Delimiter not found in the file "," #189

Closed rsv-code closed 6 years ago

rsv-code commented 8 years ago

Ran import against all openaddresses states files and the following error was thrown. I'm not sure if the process actually completed or not at this point. Looks like it may from the final info log record.

2016-11-10T17:36:53.553Z - verbose: [dbclient]  paused=false, transient=4, current_length=18, indexed=5463000, batch_ok=10926, batch_retries=0, failed_records=0, address=5463000, persec=2750
2016-11-10T17:36:56.697Z - verbose: [openaddresses] Number of bad records: 58679
2016-11-10T17:37:03.757Z - verbose: [dbclient]  paused=false, transient=4, current_length=406, indexed=5490500, batch_ok=10981, batch_retries=0, failed_records=0, address=5490500, persec=2750
2016-11-10T17:37:06.953Z - verbose: [openaddresses] Number of bad records: 64436
2016-11-10T17:37:13.662Z - info: [openaddresses] Total time taken: 1747.274s
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: Delimiter not found in the file ","
    at Error (native)
    at Parser.__write (/home/vagrant/openaddresses/node_modules/csv-parse/lib/index.js:439:13)
    at Parser._transform (/home/vagrant/openaddresses/node_modules/csv-parse/lib/index.js:172:10)
    at Transform._read (_stream_transform.js:167:10)
    at Transform._write (_stream_transform.js:155:12)
    at doWrite (_stream_writable.js:300:12)
    at writeOrBuffer (_stream_writable.js:286:5)
    at Writable.write (_stream_writable.js:214:11)
    at ReadStream.ondata (_stream_readable.js:542:20)
    at emitOne (events.js:77:13)
    at ReadStream.emit (events.js:169:7)
orangejulius commented 8 years ago

Hey @rsv-code, Thanks for the report! Hopefully what the output you've pasted means is that the import mostly completed (judging by the info: [openaddresses] Total time taken: 1747.274s line which is printed at the end of an import), but perhaps one file had trouble (again just a reasonable guess based on the Error: Delimiter not found in the file "," line).

Can you tell me exactly what OA data you're using? In the meantime I can try to reproduce this. It's very possible there's an error in one of the OA files, and we'll have to track it down.

rsv-code commented 8 years ago

Hi @orangejulius ,

Much thanks for the assistance. I've been following along with the tutorial here. https://mapzen.com/blog/pelias-setup-tutorial/ The data set I'm using is get http://s3.amazonaws.com/openaddresses/openaddresses-processed.zip and I'm importing all US files. I was able to figure out the error was occurring in the us-ca-san_diego.csv file. I wasn't able to find the line that's causing it so far. What I've done in the mean time is remove the partial index in ES and update the code in csv-parse module to ignore bad chunks for now.

node_modules/csv-parse/lib/index.js:

Parser.prototype._transform = function(chunk, encoding, callback) {
  var err, error;
  if (chunk instanceof Buffer) {
    chunk = this.decoder.write(chunk);
  }
  try {
    this.__write(chunk, false);
    return callback();
  } catch (error) {
    //err = error;
    //return this.emit('error', err);
    // Do nothing since we had a problem with this chunk ...
    process.stdout.write("node_modules/csv-parse/lib/index.js . _transform(): Bad chunk processed, doing nothing ...");
  }
};

I'm still running the import and just made it past the initial bad chunk in the file. If I see any other bad chunks I'll report those as well.

rsv-code commented 8 years ago

Also, sorry, the code change I posted previously isn't working ... keeps trying the same chunk over and over again. Still trying to work around it. Unfortunately not too familiar w/the code base.

orangejulius commented 8 years ago

Another thing is that the blog post you referenced is quite old: it's from January 2015. Things have changed a lot since then. We're adding a message to it linking to our newer installation docs.

Perhaps most notably, the URL for OpenAddresses data from the old blog post is almost two and a half years out of date.

julian@julian-mapzen ~/data $ curl -I http://s3.amazonaws.com/openaddresses/openaddresses-processed.zip
HTTP/1.1 200 OK
x-amz-id-2: 706Vez85+j5aoQDrvjQFWqie1F86TRE2awQXRsHJtR+eIJLXPaswbcc+ujGS+GWTSGLEo1L/ViY=
x-amz-request-id: 863D352D726CCFA0
Date: Thu, 10 Nov 2016 19:37:01 GMT
Last-Modified: Sun, 04 May 2014 19:30:36 GMT

Let us know if the updated installation docs clear things up, and if you have more issues don't hesitate to let us know so we can help. Thanks!

rsv-code commented 8 years ago

Ok, perfect, thanks! I wasn't aware of the new doc, I'll use that instead.