spring-projects / spring-data-mongodb

Provides support to increase developer productivity in Java when using MongoDB. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-mongodb/
Apache License 2.0
1.59k stars 1.07k forks source link

Behavior of `$project` for nested documents #4704

Open bithazard opened 1 month ago

bithazard commented 1 month ago

Hi. I'm trying to understand the overloaded project methods in an aggregation. My initial understanding was that project("x") is only the short form of project(Fields.from(field("x"))) or even project(Fields.from(field("x", "x"))). The latter variant is of course only really needed if you want to project a field to a model with a different structure. This assumption is true for top level fields. However when used for nested documents the resulting queries look a bit different and don't work as expected. See the following example:

import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.annotation.Id;
import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.aggregation.Fields;
import org.springframework.data.mongodb.core.mapping.Document;

import java.util.List;

import static org.springframework.data.mongodb.core.aggregation.Aggregation.newAggregation;
import static org.springframework.data.mongodb.core.aggregation.Aggregation.project;
import static org.springframework.data.mongodb.core.aggregation.Fields.field;

/* Example document in database "test", collection "test":
      {
          "_id" : ObjectId("51a4da9b292904caffcff6eb"),
          "levelOneDocument" : {
              "levelOneField" : "levelOneFieldValue"
          }
      }
 */
@SpringBootApplication
public class MongodbProjectSscce implements CommandLineRunner {
    private final MongoTemplate mongoTemplate;

    public MongodbProjectSscce(MongoTemplate mongoTemplate) {
        this.mongoTemplate = mongoTemplate;
    }

    public static void main(String[] args) {
        SpringApplication.run(MongodbProjectSscce.class, args);
    }

    @Override
    public void run(String... args) {
        //results in: {"aggregate": "test", "pipeline": [{"$project": {"levelOneField": "$levelOneDocument.levelOneField"}}]}
        List<RootDocument> projection1 = mongoTemplate.aggregate(newAggregation(project(
                        "levelOneDocument.levelOneField"
        )), "test", RootDocument.class).getMappedResults();
        System.out.println(projection1);
        //-> doesn't work as expected: levelOneField in LevelOneDocument is not filled, instead levelOneField in RootDocument is filled

        //also results in: {"aggregate": "test", "pipeline": [{"$project": {"levelOneField": "$levelOneDocument.levelOneField"}}]}
        List<RootDocument> projection2 = mongoTemplate.aggregate(newAggregation(project(
                Fields.from(
                    field("levelOneDocument.levelOneField")
                )
        )), "test", RootDocument.class).getMappedResults();
        System.out.println(projection2);
        //-> also doesn't work as expected: levelOneField in LevelOneDocument is not filled, instead levelOneField in RootDocument is filled

        //results in: {"aggregate": "test", "pipeline": [{"$project": {"levelOneDocument.levelOneField": 1}}]}
        List<RootDocument> projection3 = mongoTemplate.aggregate(newAggregation(project(
                Fields.from(
                        field("levelOneDocument.levelOneField", "levelOneDocument.levelOneField")
                )
        )), "test", RootDocument.class).getMappedResults();
        System.out.println(projection3);
        //-> works as expected: levelOneField in LevelOneDocument is filled, levelOneField in RootDocument is not
    }

    @Document
    public record RootDocument(
            @Id
            String id,
            String levelOneField,   //Field should not be here - for demonstration purposes only
            LevelOneDocument levelOneDocument) {}

    public record LevelOneDocument(String levelOneField) {}
}

There is a record RootDocument at the bottom which contains a nested document LevelOneDocument with only one field levelOneField. The RootDocument also contains the id which is irrelevant here and another field levelOneField. This one should not be here and I only added it to demonstrate the problem. When you run the first two aggregations (project("...") and project(Fields.from(field("...")))) they both produce the same query which leads to the wrong result - i.e. "$levelOneDocument.levelOneField" is projected to "levelOneField". Only when you explicitly state that you want to project "levelOneDocument.levelOneField" to "levelOneDocument.levelOneField" (the third aggregation - project(Fields.from(field("...", "...")))) you get the expected query and result.

The reason for the resulting query in the first two aggregations is a check in org.springframework.data.mongodb.core.aggregation.Fields:238. It checks whether name contains a period and target is null. In this case only the substring after the first period is used as name. I'm not sure what the intention of this code is. Maybe a period has a special meaning in a projection that I'm not aware of. If this is the case this should be documented somewhere. Otherwise if you only look at the overloaded methods you would assume that they all behave similarly, regardless whether you project a top level field or a nested document.

mp911de commented 1 month ago

Another datapoint: andInclude("a.b.c") renders {"b.c" : "$a.b.c"}, see #76

Concluding from the point of investigation, the goal was to derive the field name from a property. Therefore, paths use the segment after the dot. We never tested against paths containing multiple segments as the flaw that we resort to the first dot would have been revealed.

We should update this behavior with our next major release to correctly derive the field name and also verify functionality against placeholder paths (a.$.b). andInclude/andExclude should accept paths as-is and not trim these down to correctly mimic MongoDB behavior.