Closed vismayshah90 closed 5 years ago
I'm experiencing similar issues while converting a stream with just under 45k records. Approximately .1% of the records are not coming back (about 45 missing).
In addition, it should be noted that it does not appear to be a problem with any individual records, but something skipping them occasionally, as I can run the parser on the same file and get all the records after running it multiple times. While it skips some in one it has them present in another.
There is a bug in the module BUT I haven't tested recently.
If someone can provide a sample dataset that encounters the issue, I could have a look.
I'll send a mock data file that's close to what I'm using. (Still getting the errors with it)
@knownasilya Did you receive the file? It was too large to attach.
No, think you could paste it into a gist?
Any update?
I haven't worked with streams much, but I found the problem and a potential (not great solution). The problem is that the objects are getting caught between chunks. I saw there's an objectMode option (not sure what all effects this would have) which could probably be used: https://nodejs.org/api/stream.html#stream_object_mode The other solution I had was to have a variable that we could use to store the remnants of the last chunk and append it to the new chunk before trying to match. I'll create a fork and link the diff here.
Edit: Attached link to diff to show a solution for the problem https://github.com/jstephens7/json2csv-stream/commit/14ba745a3b24dff581a25eb0d095c2bf8836b84b
I'm sure there's a better way to do the same thing using a similar pattern with that._data but as I'm unfamiliar with the code I figured I would just show a proof of concept and have someone more familiar clean it up.
I think the solution would be to use another module (https://www.npmjs.com/package/JSONStream) that handles json streams, instead of using the home baked regex solution.
Like I said, that solution works, although it's not ideal. It just illustrates the problem (JSON objects getting caught in multiple chunks). I can work on another solution tonight once I'm home and make a PR.
Please use the above module when you work on your PR. I think it should solve the issue that you reproduced.
Do we know for sure whether or not that module has the problem solved we are looking at?
That module is battle tested, and it's one purpose is to parse json from a stream, so it should solve the issue.
Alright, I'll leave it to you then.
@jstephens7 @knownasilya I just added support for streams in jsonexport v2.0.0 https://github.com/kauegimenes/jsonexport I just made some tests and looks like its working well with big collections, can you give a try?
Thanks @kauegimenes, I'll take a look later on today.
It seems quite a bit heavier and less efficient than this current one (even with my bad fix). My cpu was churning on my stream (approximately 45k records) for over 20 seconds before I stopped it.
@jstephens7 I just made some changes for better peformance. Benchmark using a 50k collection and jsonexport v2.0.1
Executed benchmark against node module: "json2csv-stream"
Count (1), Cycles (1), Elapsed (7.024 sec), Hz (2.8066241073013356 ops/sec)
Executed benchmark against node module: "json2csv"
Count (1), Cycles (1), Elapsed (6.117 sec), Hz (5.5762221530032985 ops/sec)
Executed benchmark against node module: "jsonexport"
Count (1), Cycles (1), Elapsed (6.309 sec), Hz (3.461989382421796 ops/sec)
Executed benchmark against node module: "jsonexport-stream"
Count (1), Cycles (1), Elapsed (7.411 sec), Hz (1.93111416159437 ops/sec)
I'm guessing the churning is due to the fact that my data is not proper json. Instead of an array it is more of a steam of objects (separated by spaces). I'll try formatting it and seeing if that'll fix it tomorrow.
@jstephens7 That would explain it, but it should give up if its not able to parse a single object after 3 chunks of data from the stream.
@jstephens7 I had the same issue you had using mongoexport, in my case the problem was the _id: ObjectId("..."). I updated the jsonexport to print the error instead of appear stuck.
Ahh, thanks. I'll look again.
https://github.com/zemirco/json2csv now has a streams API, please use that module as this one is deprecated/unmaintained.
Hi Team,
I'm trying to use json2csv-stream module to convert json object to csv, but I have observed that when I increase my size of json object to say around 10k records then while converting to csv some of the records are getting missed without any error being thrown. In order to cross verify I have used JSLint to verify my json and it is valid json.
Sample JSON : [{"name":"xyz","subject":"CS"}, {"name":"abc","subject":"Maths"}, .... 10k records]
Sample Code : var fs = require('fs'); const jsonStream = require('json2csv-stream');
const parser = new jsonStream();
const file = fs.createWriteStream('Demo.csv') .on('close', () => { console.log('File Done'); return; }) .on('error', (err) => { console.log('File NOT Done'); return; });
fs.createReadStream('Demo.json').pipe(parser).pipe(file);
@knownasilya , Please suggest solution or root cause of what I'm doing wrong.