Open 1mike12 opened 4 months ago
parse a file even if data has a double quote, or at least produce error
silently quits file early
I kept getting less rows streamed than I expected from a file located at https://download.geonames.org/export/dump/admin2Codes.txt This is a tab delimited file of 45,784 rows
I realized that it was because one of the entries has a double quote RU.45.517838 Novotor”yal’skiy Rayon Novotor"yal'skiy Rayon 517838 , which if I delete it works properly RU.45.517838 Novotor[DELETED]yal’skiy Rayon Novotor"yal'skiy Rayon 517838
RU.45.517838 Novotor”yal’skiy Rayon Novotor"yal'skiy Rayon 517838
RU.45.517838 Novotor[DELETED]yal’skiy Rayon Novotor"yal'skiy Rayon 517838
import {Writable} from "node:stream"; import csvParser from "csv-parser"; import {Transform} from "stream"; import https from "https"; const repro = async () => { let lineCount = 0 return new Promise<void>((resolve, reject) => { https.get("https://download.geonames.org/export/dump/admin2Codes.txt", (response) => { response .pipe(csvParser({separator: "\t", headers: ["id", "name", "nameAscii", "geonameId"]})) .pipe(new Transform({ objectMode: true, transform(chunk, encoding, callback) { lineCount++ this.push(chunk); callback(); }, })) .pipe(new Writable({ objectMode: true, write(chunk, encoding, callback) { callback(); } })) .on('finish', () => { console.log("total lines should be ~45k", lineCount) resolve() }) .on('error', reject) }).on('error', reject) }) } (async () => { await repro() })()
Expected Behavior
parse a file even if data has a double quote, or at least produce error
Actual Behavior
silently quits file early
How Do We Reproduce?
I kept getting less rows streamed than I expected from a file located at https://download.geonames.org/export/dump/admin2Codes.txt This is a tab delimited file of 45,784 rows
I realized that it was because one of the entries has a double quote
RU.45.517838 Novotor”yal’skiy Rayon Novotor"yal'skiy Rayon 517838
, which if I delete it works properlyRU.45.517838 Novotor[DELETED]yal’skiy Rayon Novotor"yal'skiy Rayon 517838