mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

double quote in data makes parsing exit early without error #236

Open 1mike12 opened 4 months ago

1mike12 commented 4 months ago

Expected Behavior

parse a file even if data has a double quote, or at least produce error

Actual Behavior

silently quits file early

How Do We Reproduce?

I kept getting less rows streamed than I expected from a file located at https://download.geonames.org/export/dump/admin2Codes.txt This is a tab delimited file of 45,784 rows

I realized that it was because one of the entries has a double quote RU.45.517838 Novotor”yal’skiy Rayon Novotor"yal'skiy Rayon 517838 , which if I delete it works properly RU.45.517838 Novotor[DELETED]yal’skiy Rayon Novotor"yal'skiy Rayon 517838

import {Writable} from "node:stream";
import csvParser from "csv-parser";
import {Transform} from "stream";
import https from "https";

const repro = async () => {

  let lineCount = 0
  return new Promise<void>((resolve, reject) => {

    https.get("https://download.geonames.org/export/dump/admin2Codes.txt", (response) => {
      response
        .pipe(csvParser({separator: "\t", headers: ["id", "name", "nameAscii", "geonameId"]}))
        .pipe(new Transform({
          objectMode: true,
          transform(chunk, encoding, callback) {
            lineCount++
            this.push(chunk);
            callback();
          },
        }))
        .pipe(new Writable({
          objectMode: true,
          write(chunk, encoding, callback) {
            callback();
          }
        }))
        .on('finish', () => {
          console.log("total lines should be ~45k", lineCount)
          resolve()
        })
        .on('error', reject)
    }).on('error', reject)
  })
}

(async () => {
  await repro()
})()