logzio / sawmill

Sawmill is a JSON transformation Java library
Apache License 2.0
116 stars 24 forks source link

Does Sawmill Stream Large Documents? #170

Closed adamfisher closed 4 years ago

adamfisher commented 5 years ago

Just wondering if Sawmill streams the data into memory as processors perform their work or does it load the files all at once? Some files can be really large and wondering if it buffers data to perform the transformations?

barakm commented 4 years ago

A bit overdue, but better late, I suppose. Sawmill loads each JSON document into memory and then performs processing. I can't say that I have seen a use case where a single document was so large that this poses a problem. Considering the fact that sawmill processing pipelines are quite often long, composed of many processors, having to 'page' portions of a single document in and out of the Java heap would be costly in terms of CPU.

If this is still relevant, do you have a specific use case in mind?

adamfisher commented 4 years ago

I was thinking of using this with Apache nifi to transform json flowfiles. Sometimes the files can be very large and if it's a jsonline file then it would be ideal to page it in. I no longer need it at this point as it was so long ago I can't remember exactly what I was going to use it on. I'll close for now.