nemanja-tosic commented 2 years ago

(Work in progress, trying to capture all the details right now, looking for feedback on the points).

Scope:

[ ] DocumentSession.stream should use JSONL depending on conventions
[ ] DocumentSession.streamInto should use JSONL depending on conventions

Parsing JSONL

In stream-json it is possible to use JSON through the default Parser with the jsonStreaming flag, or with a dedicated JSONL Parser. In general, like the docs say, the JSONL Parser is more performant. For me, the JSONL parser was at least twice as fast locally. In order to utilize the potential gain from JSONL, i believe the JSONL parser must be used. Testing this locally, and my results might be wrong for an unknown reason, enabling the jsonStreaming flag was actually worse than not using jsonStreaming.

JSONL Parser, on the other hand, does not support the same filters and keeps the entire line in memory. I believe this is a good thing, given that the clients will always want to have at least one whole document in memory and that the default Parser is very slow when dealing with a large document set.

However, having the same object in memory means that we need to use the ObjectKeyCaseTransformStream for casing conventions. The library currently uses one of two stream Transforms to apply casing conventions: ObjectKeyCaseTransformStream (readable-stream) and TransformKeysJsonStream (stream-json). TODO - i'm not sure if the parsers are kept in sync, so if we need both, i'd probably figure out a way to only use one.

That being said, i'm not sure where the need for parseAsyncJson comes from when not streaming - even for a very large document the clients indicate that they will keep the entire document in memory when not using streaming so even if we assemble the document bit-by-bit while parsing json the end result is the entire object. My question would be would it be possible to:

use synchronous processing for non-streaming ops
use jsonl for streaming ops

This negates the need for two keys transformers, because we will always be dealing with an entire document.

At any rate, we need to optimize applying casing conventions for JSONL, because it increases the amount of time needed to process a request by at least 50% according to my tests.

If easier i'm always up for a zoom call, so let me know if that would help with the process.

nemanja-tosic commented 1 year ago

@ml054 i've updated the pull request best i could to support JSONL. I've added a JSONL flag to conventions so that the change is non-breaking.

Can you take a look at the TODOs i've left for conventions, because i'm not sure what to do there. Thanks!

ml054 commented 1 year ago

Hi, @nemanja-tosic

Thanks for contribution. I scheduled that at the beginning of next week.

nemanja-tosic commented 1 year ago

@ml054 do you have any availability the next couple of days? Would really like to get the changes. Thanks!

ml054 commented 1 year ago

I admit we have busy period (end of year), but I will try to find spare time for review. I'm sorry for inconvenience.

ml054 commented 1 year ago

@nemanja-tosic

objectKeysTransform is used to convert casing on object in one shot: Under the hood it uses: ObjectUtil.transformObjectKeys Which accepts object and options (target casing - usually camel, ignore paths).

I assume that performance of those might be improved by providing reviver to JSON.parse - since we do sync parsing here. When we are dealing with documents and we detect that we don't need to touch casing then we can reuse existing objects - it will save memory allocation.

For requests which doesn't operate on full objects we use: TransformKeysJsonStream. Indeed once we introduce jsonl we won't need this kind of transformation (since we can read line) and operate on single object and parse this in sync manner. However TransformKeysJsonStream has tied logic responsible for choosing proper casing transformation strategy based on conventions and current parsing position in stack.

For first step would be to emit one object in stream at the time and try to apply casing conventions using objectKeysTransform approach. (by converting profile see TransformJsonKeysProfile) into objectKeysTransform.

nemanja-tosic commented 1 year ago

Thanks for that. Unfortunately i need some help here, the casing transforms are making things difficult for me to finish the PR and would appreciate it if you could do that part.

As a side note, i wasn't sure if conventions are the best way to go, i'm guessing that's a good way about it?

ml054 commented 1 year ago

Yes conventions is centralized point where you can customize entity casing conventions. That the reason we need them during serialization/deserialization.

ml054 commented 1 year ago

Moved to: #391

ravendb / ravendb-nodejs-client

RDBC-622: JSONL support #327

Scope:

Parsing JSONL