No check for transfer-encoding==chunked before creating ResponseBody instance

GoogleCodeExporter commented 8 years ago

When fetching documents or attachments larger than 8192 bytes  
(couchdb.http.CHUNK_SIZE) the request returns a couchdb.http.ResponseBody 
instance, which in its __iter__ method has an assert for transfer-encoding == 
'chunked'.

The problem is that CouchDB can return data larger than 8192 bytes in a 
response that is not chunked, and iterating over the ResponseBody object will 
then trigger the assert.

I'm using couchdb-python version 0.8 on Ubuntu 11.04 (python-couchdb 
0.8-0ubuntu1), with CouchDB version 1.0.1 (couchdb 1.0.1-0ubuntu15).

My solution was to change the method request in the couchdb.http.Session class 
to check for transfer-encoding == 'chunked' before creating the ResponseBody 
object, otherwise read the data as it was smaller than 8192 bytes.

Original issue reported on code.google.com by joachim.pileborg on 11 Jul 2011 at 12:33

GoogleCodeExporter commented 8 years ago

Can you attach your patch, please?

Original comment by djc.ochtman on 11 Jul 2011 at 12:49

GoogleCodeExporter commented 8 years ago

The thing here is that I'm not sure the __iter__ method is part of the 
interface, i.e. something that *should* be used to get chunks, but I might be 
remembering wrong.

Original comment by djc.ochtman on 11 Jul 2011 at 12:50

GoogleCodeExporter commented 8 years ago

Patch file attached.

Original comment by joachim.pileborg on 11 Jul 2011 at 12:59

Attachments:

couchdb-http-py.patch

GoogleCodeExporter commented 8 years ago

How about just to remove assertion and for responses without 
"Transfer-Encoding: chunked" header just make single read call on iteration or 
multiple per CHUNK_SIZE value? This should make similar behavior of 
ResponseBody and StringIO object for iteration. Perfectly would be if they will 
share same methods.

For example:
- assert self.resp.msg.get('transfer-encoding') == 'chunked'
+ if self.resp.msg.get('transfer-encoding') != 'chunked':
+   yield self.read()
+   return
or
- assert self.resp.msg.get('transfer-encoding') == 'chunked'
+ if self.resp.msg.get('transfer-encoding') != 'chunked':
+   chunks = []
+   while not self.resp.isclosed():
+     data = self.resp.read(CHUNK_SIZE)
+     chunks.extend(data.splitlines())
+     for chunk in chunks[:-1]:
+       yield chunk
+     chunks = [chunks[-1]]
+   for chunk in chunks:
+     yield chunk
+   return

Original comment by kxepal on 11 Jul 2011 at 2:10

GoogleCodeExporter commented 8 years ago

I don't think __iter__ should be part of ResponseBody's API. The original idea 
was that it was file-like in that it had a read() and close() method. That's 
all.

__iter__ was added to support the changes feed and it's *very* specific to 
chunked encoding. I actually renamed __iter__ to _iterchunks at one point to 
avoid confusion and better describe its purpose, but it got changed back. I 
can't remember if there was a good reason for that now, but it sure doesn't 
seem right to me.

Original comment by matt.goo...@gmail.com on 12 Jul 2011 at 3:03

GoogleCodeExporter commented 8 years ago

Matt, file-like object have native support iteration, so if you'll remove it 
from ResponseBody you will have to always check how returned data was wrapped 
(by StringIO or ResponseBody) and changing CHUNK_SIZE constant may break a lot 
of code.
For current state, you are not able to pass ResponseBody instance to csv.reader 
due to it requires iteration support. Real use case: your list function 
returning csv formatted data.

Original comment by kxepal on 12 Jul 2011 at 4:38

GoogleCodeExporter commented 8 years ago

OK, file-like was too broad - but that's why I specifically said ResponseBody 
"... had a read() and close() method. That's all." ;-)

So, here's my opinion ...

* We don't need ResponseBody to be anything other than a way to efficiently 
read bytes from the server and tell couchdb-python when you've finished reading.
* __iter__ doesn't do what's expected, i.e. yield lines, so we should rename it 
again to make it clear it's *only* useful for iterating chunks.

If we later decide to make it more file-like then it needs doing properly and I 
suggest we use io.RawIOBase 
(http://docs.python.org/library/io.html#io.RawIOBase) as a guide.

Original comment by matt.goo...@gmail.com on 13 Jul 2011 at 11:20

GoogleCodeExporter commented 8 years ago

I'm with Matt on this. (I guess I'm the person who renamed it back; not sure 
why.)

Original comment by djc.ochtman on 13 Jul 2011 at 4:12

GoogleCodeExporter commented 8 years ago

But why not just to justify StringIO and ResponseBody by common API? That's not 
a hard, but will remove any needs to check show/list/update_doc returned result 
type just to ensure, that if we've got ResponseBody instance we must act in 
other way? In light of fact that this result strongly depended from data size I 
don't count this checks as CouchDB `relax` way of development.

Also, at least my code expects ResponseBody.__iter__ method existed and 
functioned at least in current way. I don't count myself as unique developer 
and hope this changes will break not only my code.

Original comment by kxepal on 13 Jul 2011 at 5:20

GoogleCodeExporter commented 8 years ago

This issue was closed by revision e260990c44ff.

Original comment by djc.ochtman on 21 Sep 2012 at 7:52

Changed state: Fixed

lanto03 / couchdb-python

No check for transfer-encoding==chunked before creating ResponseBody instance #190