spring-projects / spring-data-mongodb

Provides support to increase developer productivity in Java when using MongoDB. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-mongodb/
Apache License 2.0
1.62k stars 1.09k forks source link

Custom converters for complete documents slow on large result sets #4614

Closed dirkbolte closed 9 months ago

dirkbolte commented 9 months ago

I have a collection with > 10k documents. For one use case I need to fetch all of them. In order to minimize the actual data that is processed, I already added a projection along with a ReadConverter for the whole document in order to optimize document creation (aligned to https://docs.spring.io/spring-data/mongodb/reference/mongodb/mapping/custom-conversions.html#mongo.custom-converters.reader ) . The code looks similar to this:

    @ReadingConverter
    class TargetEntityReadConverter : Converter<Document, TargetEntity> {
        override fun convert(source: Document): TargetEntity {
            return TargetEntity()
        }
    }

This is registered via

    @Bean
    fun mongoCustomConversions(): MongoCustomConversions =
      MongoCustomConversions(listOf(TargetEntityReadConverter()))

I still found the query on a result set of this size to be slow. I was able to narrow it down to the actual converter resolution. GenericConversionService which calls getConverter twice. The expensive part is within handling of the converter cache, which compares ConverterCacheKey, having org.bson.Document as source type. Calling org.springframework.core.convert.TypeDescriptor#equals for org.bson.Document ~20k times takes about 50% of the overall processing time (the same (and same amount of) comparison for my target entity is significantly less) (evaluated with IntelliJ profiler). Main contributors are the checks for isCollection, isArray and the logic withinisMap`:

// extract from TypeDescriptor.java

    @Override
    public boolean equals(@Nullable Object other) {
        if (this == other) {
            return true;
        }
        if (!(other instanceof TypeDescriptor otherDesc)) {
            return false;
        }
        if (getType() != otherDesc.getType()) {
            return false;
        }
        if (!annotationsMatch(otherDesc)) {
            return false;
        }
        if (isCollection() || isArray()) { // evaluation seems to be contributing
            return ObjectUtils.nullSafeEquals(getElementTypeDescriptor(), otherDesc.getElementTypeDescriptor());
        }
        else if (isMap()) { // code in block seems to contributing
            return (ObjectUtils.nullSafeEquals(getMapKeyTypeDescriptor(), otherDesc.getMapKeyTypeDescriptor()) &&
                    ObjectUtils.nullSafeEquals(getMapValueTypeDescriptor(), otherDesc.getMapValueTypeDescriptor()));
        }
        else {
            return true;
        }
    }

As mitigation, I created a custom repository implementation, built the query object myself. This approach returned the plain document, so that I could call the converter myself (same code). This approach took somewhere between 5-10% of the CPU time of the initial approach.

Is there a way for me to improve the Converter resolution to avoid the repeating resolution - or for the MappingMongoConverter to optimize comparison or conversion?

christophstrobl commented 9 months ago

Thank you for getting in touch. The ConversionService and its components are part of the core-framework. I think it would make sense to raise the performance concern there.

dirkbolte commented 9 months ago

Not sure whether it can be completely addressed there as both the source type and the processing is outside, but I will definitely do so.