spring-projects / spring-framework

Spring Framework
https://spring.io/projects/spring-framework
Apache License 2.0
56.72k stars 38.15k forks source link

AbstractJackson2Decoder doesn't support decoding NDJSON arrays into Flux<List> #32579

Closed yuflyud closed 7 months ago

yuflyud commented 7 months ago

Affects: 6.1.5

I'm attempting to implement a Reactive RestController that accepts NDJSON (Newline Delimited JSON) where each row may contain a JSON array. An example of such NDJSON payload is as follows:

[{ "recordType":"Patient", "name":"John Doe"},{ "recordType":"Visit", "state":"complete"}]
[{ "recordType":"Practitioner", "name":"Stephen Smith"}]

The desired behavior is to transform the request body into a Flux<List<Record>>, where each emitted list corresponds to the JSON arrays within each row. However, the current implementation, which is completely expected according to the documentation, only produces a Flux<Record>.

Below is the endpoint code I would like to be supported:

  @PostMapping(value = "/submit", consumes = {MediaType.APPLICATION_NDJSON_VALUE},
      produces = MediaType.APPLICATION_JSON_VALUE)
  public Mono<Void> submit(@RequestBody Flux<List<Record>> rows) {
    return rows.log()
        .then();
  }

Upon investigation, I've identified that the issue lies within the AbstractJackson2Decoder class. Specifically, the tokenizeArrays parameter is set to true in the following line:

Flux<TokenBuffer> tokens = Jackson2Tokenizer.tokenize(processed, mapper.getFactory(), mapper, true,
        forceUseOfBigDecimal, getMaxInMemorySize());

This configuration causes the NDJSON payload with nested arrays to be tokenized in a way that produces Flux<Record> instead of the desired Flux<List<Record>>. At present, I cannot find a straightforward way to modify this behavior without resorting to copying and altering the entire source code of the AbstractJackson2Decoder class.

rstoyanchev commented 7 months ago

Indeed, Jackson2Tokenizer.tokenize has a boolean flag tokenizeArrays that determines whether to flatten the array or not, and there are tests that parse the array both ways. However, we always pass true from AbstractJackson2Dedcoder and there is no way to change that.

It should be an easy change then, just a matter of how to decide when to set the flag. We could perhaps make it conditional on whether the media type is known as a streaming type. We have similar checks on the encoder side.

rstoyanchev commented 7 months ago

On second though, NDJSON implies a stream, and that should be streamed rather than collected to a List. So I'm not sure we can do anything transparently for your case. At best, we could provide some way to control it through a protected method.

Could you explain a little better your case and why you are collecting NDJSON stream to a list?

yuflyud commented 7 months ago

I'm not aggregating multiple NDJSON lines into a single list. Rather, each line represents a distinct item, which could be structured as a JSON array itself. This maintains the NDJSON format as a continuous stream of data.

In my use case, I need to handle multiple objects within the same NDJSON line as a single transaction. Therefore, I need the framework to automatically group these objects in the same way they are structured within an NDJSON format. For example:

[{"name": "Alice", "age": 25, "city": "San Francisco"}]
[{"name": "John", "age": 30, "city": "New York"},{"name": "Bob", "age": 35, "city": "Chicago"}]
[{"name": "Eve", "age": 28, "city": "Los Angeles"}]

I reviewed the NDJSON specification and did not find any explicit statement that prohibits a line from containing a JSON array instead of a single object. However, I wouldn't be surprised if this is indeed the case. Therefore, my request might deviate from the specification and may be considered invalid.

rstoyanchev commented 7 months ago

Okay, I see now. We can decide then whether to set tokenize based on the target type. If the element type is an array or collection, then false (don't flatten), or true otherwise.