spring-projects / spring-data-mongodb

Provides support to increase developer productivity in Java when using MongoDB. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-mongodb/
Apache License 2.0
1.61k stars 1.08k forks source link

Aggregations should be able to handle `Sort.unsorted()` #4703

Open bithazard opened 4 months ago

bithazard commented 4 months ago

Hi. When you create an aggregation pipeline and use a sort operation with Sort.unsorted() an exception is thrown (UncategorizedMongoDbException). The reason for the exception is, that the sort stage is created but it is simply an empty object. You can reproduce the behavior with the following example:

import org.bson.Document;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.domain.Sort;
import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.query.Query;

import static org.springframework.data.mongodb.core.aggregation.Aggregation.newAggregation;
import static org.springframework.data.mongodb.core.aggregation.Aggregation.sort;

/* Example documents in database "test", collection "test" (not strictly required to reproduce the exception):
      {
          "_id" : ObjectId("51a4da9b292904caffcff6eb"),
          "x" : 0
      }

      {
          "_id" : ObjectId("51a4da9b292904caffcff6ec"),
          "x" : 1
      }

      {
          "_id" : ObjectId("51a4da9b292904caffcff6ed"),
          "x" : 2
      }
 */
@SpringBootApplication
public class MongodbSortSscce implements CommandLineRunner {
    private final MongoTemplate mongoTemplate;

    public MongodbSortSscce(MongoTemplate mongoTemplate) {
        this.mongoTemplate = mongoTemplate;
    }

    public static void main(String[] args) {
        SpringApplication.run(MongodbSortSscce.class, args);
    }

    @Override
    public void run(String... args) {
        //results in: {"find": "test", "filter": {}}
        mongoTemplate.find(new Query().with(Sort.unsorted()), Document.class, "test");             //works

        //results in: {"aggregate": "test", "pipeline": [{"$sort": {"x": 1}}]}
        mongoTemplate.aggregate(newAggregation(sort(Sort.by("x"))), "test", Document.class);       //works

        //results in: {"aggregate": "test", "pipeline": [{"$sort": {}}]}
        mongoTemplate.aggregate(newAggregation(sort(Sort.unsorted())), "test", Document.class);    //fails with "$sort stage must have at least one sort key"
    }
}

You can see that it works when you sort by an arbitrary field (x in the example above). It also works when using find instead of aggregate. In this case the sort operation never reaches the MongoDB(-driver) but is removed somewhere in between. This should also happen for an aggregation in my opinion.

Of course a developer can make sure that an unsorted sort operation is never added to the pipeline (in a real world scenario the sort field will probably come from a URL parameter or something like that - so it's not as obvious as in the example) but that implies that the developer is aware that an exception will be thrown otherwise.

christophstrobl commented 4 months ago

Thank you @bithazard for getting in touch. We'll look into this.

christophstrobl commented 4 months ago

I've been looking at the mongodb documentation and related tickets but there's no way to pass in something similar to $natural sorting when using aggregations. We cannot silently ignore the stage if unsorted is given to it nor would I want to put in a hard guard that raises an error. Updating the documentation might help raising awareness.

bithazard commented 4 months ago

I thought into this direction as well. Sorting by id would probably come as close as possible to a "$natural sorting". But that could lead to some unwanted side effects elsewhere (not to mention that it would fail for documents without id). So this is clearly not an option.

Of course the issue could be solved on the MongoDB side. They could simply allow sort: {} / sort: null as valid syntax for an aggregation. It is valid for a find as well (e.g. {"find": "test", "sort": {}} or even db.test.find({}, {}, { "sort": null })). But then again that would not really help us here, as it would only work with MongoDB versions after the change.

Regarding the "silently ignore the stage": I think it would be more like skipping the stage because there is simply nothing to do. I understand that it's not great to add a special case for such a situation but I also looked a bit into the code and I think it would be possible to add a check to https://github.com/spring-projects/spring-data-mongodb/blob/main/spring-data-mongodb/src/main/java/org/springframework/data/mongodb/core/aggregation/AggregationOperationRenderer.java#L56 to handle this.

There is a loop that iterates over all AggregationOperations. There are even already some instanceof checks. If we check if operation is an instance of SortOperation we could conditionally add or not add it to operationDocuments. But for this we would need some way to figure out whether the SortOperation is unsorted. Currently SortOperation is very private. We could either add an isUnsorted() method which simply passes the call through to sort.isUnsorted() or we could override equals so that we can check sortOperation.equals(new SortOperation(Sort.unsorted())).

Just a thought. I don't know the code well enough to say if that would cause issues elsewhere.

christophstrobl commented 4 months ago

Thanks for the feedback - the checks in AggregationOperationRenderer relate to the context the next stage might operate on, not so much of handling specifics of a stage and we do not want to scatter logic throughout different code paths. While we can decide to skip stages for annotated aggregations on repository level, those explicitly provided by the user and fed to the template should be treated as given with all the implications (like the server error) this may have. Maybe it's worth asking the MongoDB team to consider adding something like a $natural to aggregations.

bithazard commented 4 months ago

I've added a suggestion to the "MongoDB Feedback Engine" that you can find under https://feedback.mongodb.com/forums/924280-database/suggestions/48473684-aggregations-should-allow-an-empty-sort-stage-inst. Feel free to give it a vote.

If this gets implemented, the way that you currently generate the sort stage, will start working. I think that it also makes sense for MongoDB to handle an empty sort stage like this. A sort object with n keys will sort by these n fields, so an empty object should not sort by any field. Adding another special keyword (like $natural) would just add more (unnecessary) complexity. And you still would have to check if the sort object is empty and only in this case add a {"$natural": 1} (or something like that) to the object.

I also found this section of the MongoDB sort aggregation documentation, where they recommend to add the _id field to the sort object if you want to achieve sort consistency. I think sort consistency is always a good thing and if you get it almost for free (as the _id is always indexed), I don't see any reason not to add it. So as a workaround one could always add the _id as sort field like this:

    //Some sort object you get from somewhere (could be Sort.unsorted())
    Sort sort = ...;

    mongoTemplate.aggregate(newAggregation(sort(sort.and(Sort.by("_id")))), "test", Document.class);