uhop / stream-json

The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.
Other
964 stars 47 forks source link

How to use it in browser? I know it is for node.js now #66

Closed hoogw closed 4 years ago

hoogw commented 4 years ago

I need it to run in browser. client js.

Can I run npm run build to generate ./dist/ folder? and use it in browser?

uhop commented 4 years ago

Unlikely. This is a library based on streams. While there are browser-based streams, they are not widely implemented and their API is still experimental. There are some stubs to bring Node streams to a browser, but I have no first-hand experience with them and I don't know how good they are.

Just out of curiosity: what is your use case? The library was built to process mounds of data and used mostly in applications (e.g., packed with Electron), (command line) utilities, in some rare cases on a server, but I didn't hear anybody wanting to use it on a client.

hoogw commented 4 years ago

Recommend you continue build the browser version of it.

Huge market in future. why? because the trend is move to rest api, json, is standard data exchange format. For large json, hundreds MB, to 1GB json, stream-json is a MUST.

my web site, handle json file from 100MB to 2 GB per page. all the core engine on every page 90% is build on https://developer.mozilla.org/en-US/docs/Web/API/Streams_API

my site: https://transparentgov.net/cleargov1/

a small sample:

http://transparentgov.net:3000/socrata/dataset/default?url=https://data.lacity.org/api/id&layer=Building%20Permits%20over%20100K%20Valuation&layer_id=y5ik-mwat

hoogw commented 4 years ago

currently, I use oboe.js as client json stream. But I think your library is newer and maybe remove all the inconvenience of oboe.js.

I need to stream json file (100MB - 1GB) to browser and store in indexedDB. Must have json stream api to avoid cache whole 1GB data in memory.

This is why client side stream api is greatly needed.

uhop commented 4 years ago

stream-json will support browser eventually but there are some issues that should be resolved besides the API stability and its general support. One of them is the lack of default implementations, which can be reused. For example the document (referenced above) defines only interfaces, but not pressure-related algorithms, there is no helpers to create Transform streams (most streams in stream-json are Transform ones) and so on.

But I am sure that we will be there. But not at the present moment.

hoogw commented 4 years ago

Another thing why I need this is:

Oboe.js do NOT have function let me abort the stream before it reach end !

for example:

jquery ajax, have xhr.abort fetch have fetch abort

oboe use xhr, should have abort function, but NOT implement, not sure why. This have big use case like me. 90% of my mapping web page, need to abort ajax, abort fetch etc... because otherwise, when user pan the map, continuously, each pan/zoom map will fire a stream request, so a chain of streaming request piled up, user have to wait long time until all other stream ended one by one to get the latest usefull stream started, it is horrible user experience. We want abort all other stream, only keep the latest stream live.

      Can you think of add abort stream before it reach end function?
uhop commented 4 years ago

Can you think of add abort stream before it reach end function?

This is outside of stream-json. Usually, when there is no more need in data a pipe is disconnected and discarded. Eventually, it will be garbage-collected. To be clean it is advised to disconnect event handlers. To process new request a new pipe can be constructed and hooked to the event machinery like the previous pipe was.

hoogw commented 4 years ago

About abort stream, thanks for explain, that make sense to me.

Another issue, I found on oboe, that stop me from using it. Is performance, too slow.

I test 60K (350MB) json file, oboe, compare to browser native stream api is 100 times slower.

Only for small amount of data, difference is trivals.

Oboe, must have some over head over xhr.

Now, for large data, I can't use oboe, it is way too slow. Instead, I just use browser native stream api , write simple parser on my own.

I am not sure about this json-stream performance, hope you can prevent the slowness that oboe had.

uhop commented 4 years ago

The whole point of stream-json is performance. It targets huge streams and obviously every millisecond helps — IRL they add up to hours and even days. So obviously when I port it to a browser I'll keep it in mind.

hoogw commented 4 years ago

I guess, not verified, the slowness of oboe, is caused by blob, Blob/FileReader , blob builder etc....

To make it fast, need to use var string = new TextDecoder(encoding).decode(uint8array);

there are several article talking about this slowness:

https://developers.google.com/web/updates/2012/06/How-to-convert-ArrayBuffer-to-and-from-String

https://stackoverflow.com/questions/6965107/converting-between-strings-and-arraybuffers

Just FYI, when you do browser version, this thing need to aware of.

uhop commented 4 years ago

Thank you for the info! Archiving for now.