jur9526 / couchdb-python

Automatically exported from code.google.com/p/couchdb-python
Other
0 stars 0 forks source link

Upon querying a view, couchdb-python reads the entire result, ... #162

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
...then parses it, then returns it to the user. All the while, both raw and 
parsed representation are kept in memory. How about some love for people whose 
data is too big to fit in main memory? Even once?

:(

Original issue reported on code.google.com by andreas.kloeckner@gmail.com on 23 Jan 2011 at 11:16

GoogleCodeExporter commented 9 years ago
It's a great idea to support this. However, it's not a straightforward issue. A 
streaming JSON parser is required in order to deliver an iterable stream of 
rows without holding the whole response in memory. To solve this problem 
probably requires using YAJL in combination with a Python binding like 
ijson[1]. A parser like ijson has a much different API than simplejson or the 
standard library parser meaning code must be differentiated and larger pieces 
of CouchDB-Python rewritten to handle it. I also suspect that making it a 
strict requirement would be untenable. The best approach would perhaps be to 
use the http and client modules of CouchDB-Python directly, subclassing 
Resource, Server, Database, etc. I don't see an immediately straightforward way 
to just graft a streaming parser into the system without lots of new code.

[1] https://github.com/kennethreitz/ijson#readme

Original comment by randall....@gmail.com on 24 Jan 2011 at 12:26

GoogleCodeExporter commented 9 years ago
I wrote a some code to support iterating over rows.

https://github.com/openlibrary/openlibrary/blob/master/openlibrary/core/couch.py

Original comment by anandol...@gmail.com on 24 Jan 2011 at 9:32

GoogleCodeExporter commented 9 years ago
@Matt had implemented iterative views a long time ago
http://code.google.com/r/mattgoodall-couchdb-python-iterview/source/browse/couch
db/client.py#829

After some time in production I could say that they works perfectly. The only 
thing left to use them by default for db iteration and ViewField with some 
constant batch number: too small as 100 produce too many requests, 10K is quite 
optimal and mercy for memory and requests count, but it should be tweakable.

Why not to get this feature to mainstream?

Original comment by kxepal on 25 Feb 2012 at 9:13

GoogleCodeExporter commented 9 years ago
Add Matt's iterview feature as folded patch.

Original comment by kxepal on 24 Apr 2013 at 5:53

Attachments:

GoogleCodeExporter commented 9 years ago
Nice one. I did a slightly less squashed version as r4f4166f23558 and 
follow-ups, and pushed that into the repository. This way, we get to see some 
of the progress (though not some of the more trivial changes made along the 
way).

Original comment by djc.ochtman on 25 Apr 2013 at 10:10