twisted / txmongo

asynchronous python driver for mongo
https://txmongo.readthedocs.io
Apache License 2.0
338 stars 102 forks source link

Added support for batchsize parameter in collection.py/ #226

Closed evert2410 closed 6 years ago

evert2410 commented 6 years ago

Added optional batchsize parameter in find_with_cursor (collection.py) Related: cursors may time out (cursorTimeout ...) Added handling for CursorNotFound in handle_REPLY (protocol.py) Added test for batchsize: test_FindWithCursorBatchsize (tests.test_queries.TestMongoQueries) test_FindWithCursorBatchsize ... ok

The proposed changed builds on the default behavior of Collection.find_with_cursor. When issuing the initial request 101 documents are returned, the remainder of the documents are returned upon the next iteration (yielding on the deferred returned in the initial request)

This change allows one to configure the number of documents to be returned per iteration. It helps heavily loaded clients to control their resources, when working with large collections.

An alternative approach would be to control the documents to be returned by playing with the parameters "skip" and "limit" in Collection.find, which is much less efficient than using the proposed change and puts a heavier burden on mongodb.

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.1%) to 95.198% when pulling 152e5361ae5785a356e72ef4850609368f5e99b8 on evert2410:master into 311644d27d71902f4dce55b86c707225a42fa82d on twisted:master.

psi29a commented 6 years ago

Thanks for the PR, looks like a few tests.test_aggregate.TestAggregate tests are failing. Likely just a keyword?

Have a look here: https://travis-ci.org/twisted/txmongo/jobs/349566061

psi29a commented 6 years ago

What is the performance impact on this change?

evert2410 commented 6 years ago

Done :-)

On Tue, Mar 6, 2018 at 9:55 PM, Ilya Skriblovsky notifications@github.com wrote:

@IlyaSkriblovsky requested changes on this pull request.

Thanks! You changes seem reasonable and useful. Could you please address a couple of comments mostly related to tests?

In docs/source/NEWS.rst https://github.com/twisted/txmongo/pull/226#discussion_r172653930:

@@ -1,6 +1,16 @@ Changelog

+ +Release UNVERSIONED +--------------------------- + +Features +^^^^^^^^ + +- Added support for paged request: implementation of batchsize parameter in Collection.find_with_cursor

Could you please move it to Release 18.1.0 (UNRELEASED)?

In txmongo/collection.py https://github.com/twisted/txmongo/pull/226#discussion_r172654323:

@@ -367,7 +367,7 @@ def __apply_find_filter(spec, c_filter):

 @timeout
 def find_with_cursor(self, *args, **kwargs):
  • """find_with_cursor(filter=None, projection=None, skip=0, limit=0, sort=None, **kwargs)
  • """find_with_cursor(filter=None, projection=None, skip=0, limit=0, sort=None, batchSize=0, **kwargs)

Should be batchsize, not batchSize. By the way, it should probably be batch_size to conform to PEP8.

In txmongo/collection.py https://github.com/twisted/txmongo/pull/226#discussion_r172655394:

@@ -449,6 +466,8 @@ def after_reply(reply, protocol, this_func, fetched=0): to_fetch = limit - fetched if to_fetch <= 0: to_fetch = None # close cursor

  • elif batchsize:

We are trying hard to cover as much code branches as possible to avoid regressions. Could you please add some tests covering all these branches in __real_find_with_cursor?

In txmongo/protocol.py https://github.com/twisted/txmongo/pull/226#discussion_r172656119:

@@ -381,6 +381,11 @@ def handle_REPLY(self, request): df.errback(err) if fail_conn: self.transport.loseConnection()

  • elif request.response_flags & REPLY_CURSOR_NOT_FOUND:

Is it possible to simulate cover this in unit tests? If it is hard to achieve with real MongoDB, is it possible to monkeypatch it with mock.patch?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/twisted/txmongo/pull/226#pullrequestreview-101708995, or mute the thread https://github.com/notifications/unsubscribe-auth/ARUCXPDKJH8YpRi1XX2pQa8ELDo5oW5Hks5tbve9gaJpZM4Sd2N3 .

IlyaSkriblovsky commented 6 years ago

One more minor issue: we are still supporting Twisted ≥ 14.0.0. But assertRaises can only be used as context manager starting with Twisted 15.

Could you please rewrite this code:

with self.assertRaises(CursorNotFound):
    next_reply = yield protocol.send_GETMORE(Getmore(collection = str(self.coll),cursor_id = cursor_id,n_to_return = 10))

like this:

self.assertFailure(protocol.send_GETMORE(Getmore(collection=str(self.coll), cursor_id=cursor_id, n_to_return=10)),
                   CursorNotFound)
evert2410 commented 6 years ago

A little confused by the failed tests.

I reran the full test-suite locally with different versions of twisted, and they all ran ok.

Testresults are attached, filename indicates version of twisted used.

testresults_t14.txt testresults_t15.txt testresults_t17.txt

Interestingly, I had to reboot for the tests related to replica_set to succeed.

Haven't figured out the cause here .... yet.

Evert

IlyaSkriblovsky commented 6 years ago

All tests seem to be ok now: https://travis-ci.org/twisted/txmongo/builds/350602612

IlyaSkriblovsky commented 6 years ago

LGTM @psi29a, what is your opinion?

psi29a commented 6 years ago

I kicked the test to restart it, seems fine to me too. Let's merge it.

Thanks @evert2410

IlyaSkriblovsky commented 6 years ago

Seems like changelog entry for this PR was mistakenly added to 18.0.0 instead of unreleased 18.1.0. I'm going to fix it.