marklogic / marklogic-contentpump

MarkLogic Contentpump (mlcp)
http://developer.marklogic.com/products/mlcp
Apache License 2.0
34 stars 26 forks source link

Save/export rows that failed ingest due to Delimited Text Ingest Fails on Unescaped Quotes #84

Open janmichaelyu opened 6 years ago

janmichaelyu commented 6 years ago

We're encountering a similar issue to https://github.com/marklogic/marklogic-contentpump/issues/68 for files that are tab delimited but with unescaped quotes:

Sample:

11:16:43.614 [pool-1-thread-1] WARN  c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter

02020201 "opt out" service Observation   DOMAIN   DOMAIN 

It would be great if we could get the failed records in a separate file or in the log so we could examine quickly what went wrong during the ingest and see what kind of formatting error we have and fix it.

janmichaelyu commented 6 years ago