Closed spring-projects-issues closed 6 years ago
Oliver Drotbohm commented
Wondering whether it makes sense to expose a configurable CursorPreparer
(which is currently already used internally) to allow tweaking the cursor setup. Do you think think these settings should should be applied globally (per MongoTemplate
), per domain type, per query, per query execution?
Sylvain LAURENT commented
Hello,
I'm also interested in having support for cursor batch size for the following case : I retrieve documents as a stream and iterate over them. For various reasons, I keep only between 15 and 40 (more or less) documents, depending on the content of each documents (and some external data, which is why I cannot directly filter in mongo). Then I close the cursor.
I noticed some bad performance because the documents are quite big and by default the first batch retrieves 100 (actually 101) documents which is a waste in my case. To answer Oliver's question, I think that such a setting should be exposed at least at the query level. Maybe at the entity level ? In all cases, this would be useful essentially for queries that stream, since queries that return a List retrieve all the documents that match
Rob Moore commented
I am using streaming and am running into an issue currently that seems to be related to the batch size. The operation being performed for each stream result takes some time and I'm seeing errors like the following:
java.util.concurrent.CompletionException: org.springframework.dao.DataAccessResourceFailureException: Query failed with error code -5 and error message 'Cursor 43827425629 not found on server xxx:11001' on server xxx:11001; nested exception is com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 43827425629 not found on server xxx:11001' on server xxx:11001
I believe that the problem might be resolved if we could make the batch size smaller than the server default (100, I believe) as it would keep the connection active. This thinking is motivated by a suggestion made on the mongod-user group: https://groups.google.com/forum/#!msg/mongodb-user/n1OAHPJ5FNA/oBIxevjA2ewJ
Mark Paluch commented
Rob Moore your issue could be probably solved using smaller batch sizes or by disabling cursor timeouts, see DATAMONGO-1480
Rob Moore commented
@mpaluch
I'm not sure I follow you but I think we're on the same page. I was hoping to have the batch size option on repository methods so I could configure it in an attempt to address the issue I'm seeing
Mark Paluch commented
That issue you described can be solved in two ways: Either decreasing the batch size to interact more often with the cursor or disabling the cursor timeout. The first will keep the cursor alive the latter will disable cursor timeouts and so your cursor will stay available no matter how long your process remains active but requires explicit resource cleaning
Rob Moore commented
Agreed but are you suggesting this ticket is unnecessary? Are you suggesting that we break the query into smaller queries instead (that is, manage the batching outside of the repository method)?
Mark Paluch commented
My message is a different one: Controlling the batch size allows fine grained control over chunks and fetching when consuming results through a stream. Especially when looking at reactive APIs, the batch size is derived from the subscriber demand and this can lead to a lot of getMore
calls.
I'm not sure whether it makes sense to tweak defaults per repository/per query/per exection. Setting the batch size per query would be a first step and for Template API usage, we could leverage Query metadata like we already do for e.g. execution time. So a possible usage could look like:
interface PersonRepository extends CrudRepository<Person, String> {
@Meta(batchSize = 512)
Stream<Person> findAllBy();
}
Additionally, we could consider accepting org.springframework.data.mongodb.core.query.Meta
arguments in query methods to apply query hints per-execution
Christian Schneider opened DATAMONGO-1311 and commented
It would be great if you provide an option to set the cursor.batchSize() for Streaming Query Results.
In case of ETL where you process a lot of GB, streaming results is already heaven on earth compared to paging. In the MongoDBCursor default implementation the
batchSize
is set to 0 which means the database chooses it.In my configuration the
batchSize
seems to be very small. I could observe that when I fetch data from a remote database.Java MongoDB Driver BatchSize Option
Sidenote
I couldn't verify that overriding the
batchSize
gives the expected performance boostIssue Links:
Referenced from: pull request https://github.com/spring-projects/spring-data-mongodb/pull/575
2 votes, 6 watchers