zemirco / json2csv

Convert json to csv with column titles
http://zemirco.github.io/json2csv
MIT License
2.72k stars 364 forks source link

Does Generate a CSV from NDJson takes the first JSON field structure ? #504

Closed ronycohen closed 3 years ago

ronycohen commented 3 years ago

Hello,

I get from several XML -> NDJSON files some differences like :

{ prop1 : xxx, prop2 : xxx, prop3 : xxx }\r\n { prop2 : xxx, prop3 : xxx, prop4 : xxx, }\r\n ...

By default, It appears that the initial fields structure : prop1, prop2, prop3.

Is there a way to get prop1, prop2, prop3 and prop4 ?

Best regards,

juanjoDiaz commented 3 years ago

How are you using json2csv?

As it's stated by the docs, the sync API gets all the fields from all the records, whereas the async API only gets the fields from the first record. If you are using the CLI, it uses the async API by default unless you pass the --no-streaming flag.

The sync API loops through the whole object twice and it loads the input JSON and the resulting CSV entirely in memory so it's suboptimal if the object is large. You can still get all the fields while using the async API by passing the using the fields option (fields: ['prop1', 'prop2', 'prop3', 'prop4',] or in the CLI --fields prop1,prop2,prop3,prop4).

ronycohen commented 3 years ago

Hi @juanjoDiaz ,

I use the Json2csv Streaming Api. And I have several sources of XML files that I transforms and concat into a Json file (source.json). I don't know which XML render what fields.... that's my issue... That's why I do not use the fields option.

    let json = fs.createReadStream(path.join(target_path, `source.json`), { encoding: 'utf8' });
    let csv = fs.createWriteStream(path.join(target_path, `target.csv`), { encoding: 'utf8' });

    let unwind = ['PROP1', 'PROP5', 'PROP2'];

    let json2csv = new Transform({
        unwind,
        flatten: true,
        flattenSeparator: "_",
        ndjson: true,
        delimiter: ';'
    }, { highWaterMark: 16384, objectMode: false, encoding: "utf-8" });

    json
        .pipe(json2csv)
        .pipe(csv)

Do I have to define fields ?

Best regards,

juanjoDiaz commented 3 years ago

Yes, you do. 🙂 Streams are unbound by definition; i.e. you process data as it comes (can't process it twice) and you can not tell when a stream will end (until it actually ends). That's why json2csv can not know all the possible fields until the whole stream is consumed, and by then, it's too late.

So you have the 2 options that I outlined in my previous message:

ronycohen commented 3 years ago

Thank you very much :)

I think I'll maybe pass the whole stream twice in order to catch on the first time the fields structure and ask for it on the second time.