Closed ronycohen closed 3 years ago
How are you using json2csv?
As it's stated by the docs, the sync API gets all the fields from all the records, whereas the async API only gets the fields from the first record. If you are using the CLI, it uses the async API by default unless you pass the --no-streaming
flag.
The sync API loops through the whole object twice and it loads the input JSON and the resulting CSV entirely in memory so it's suboptimal if the object is large.
You can still get all the fields while using the async API by passing the using the fields
option (fields: ['prop1', 'prop2', 'prop3', 'prop4',]
or in the CLI --fields prop1,prop2,prop3,prop4
).
Hi @juanjoDiaz ,
I use the Json2csv Streaming Api. And I have several sources of XML files that I transforms and concat into a Json file (source.json). I don't know which XML render what fields.... that's my issue... That's why I do not use the fields option.
let json = fs.createReadStream(path.join(target_path, `source.json`), { encoding: 'utf8' });
let csv = fs.createWriteStream(path.join(target_path, `target.csv`), { encoding: 'utf8' });
let unwind = ['PROP1', 'PROP5', 'PROP2'];
let json2csv = new Transform({
unwind,
flatten: true,
flattenSeparator: "_",
ndjson: true,
delimiter: ';'
}, { highWaterMark: 16384, objectMode: false, encoding: "utf-8" });
json
.pipe(json2csv)
.pipe(csv)
Do I have to define fields ?
Best regards,
Yes, you do. 🙂 Streams are unbound by definition; i.e. you process data as it comes (can't process it twice) and you can not tell when a stream will end (until it actually ends). That's why json2csv can not know all the possible fields until the whole stream is consumed, and by then, it's too late.
So you have the 2 options that I outlined in my previous message:
fields
optionsThank you very much :)
I think I'll maybe pass the whole stream twice in order to catch on the first time the fields structure and ask for it on the second time.
Hello,
I get from several XML -> NDJSON files some differences like :
{ prop1 : xxx, prop2 : xxx, prop3 : xxx }\r\n { prop2 : xxx, prop3 : xxx, prop4 : xxx, }\r\n ...
By default, It appears that the initial fields structure : prop1, prop2, prop3.
Is there a way to get prop1, prop2, prop3 and prop4 ?
Best regards,