Closed bradwood closed 5 years ago
Update... I've validated that this is indeed the behaviour -- if I do: cat file| gunzip
I get the data back no problem... Same with stream=True
.
Hi. The callback is just to allow you to do whatever you like with the raw bytes coming in. There is no default callback function, so really the "default behavior" is to do literally nothing at all!
On your question of - is on the fly decompression even possible - the answer is "sort of". It's possible the way you see the stream
argument doing it. You can reimplement that in a callback if you want, or just use the stream
arg.
Just to be clear, a callback is a function to be run on a certain event. You supply the function. The one given in the example does nothing more than write whatever comes in to file.
ok... i think i follow you -- how can I tell if the payload is gzipped or not before attempting unzipping it? And, as a side question, do you not think it would be a cleaner if the method returned the unzipped content as a default as the other invocation does?
Ok, lemme give you a silly example.
async def totally_useless_callback(bytes):
print('NOM NOM NOM')
If you pass this as a callback, it will totally work, but just print NOM NOM NOM every time we read in bytes and do nothing useful. You can pass whatever function you want as long as it takes at least one argument.
The callback argument is to allow people to do whatever crazy shit they'd like, without restriction. It can be as useless or as useful as you make it.
Even if I did want to enforce something like decompression, I couldn't, as I don't control what functions people pass as a callback. You are probably best served by the stream
param, as it's what you want in like 99.99999999% of use cases :)
You can tell if the payload is compressed by checking the headers for content-encoding
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding
Hrm... So I tried the stream option like so:
resp = await asks.get(str(self._url), stream=True)
async with await trio.open_file(newfile, 'ab') as output_file:
async with resp.body:
async for bytechunk in resp.body:
await output_file.write(bytechunk)
Still seems to be barfing on the gzip?
Traceback (most recent call last):
File "src/pyskyq/examples/cli_epg.py", line 70, in <module>
trio.run(main, sys.argv[1:])
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/trio/_core/_run.py", line 1337, in run
raise runner.main_task_outcome.error
File "src/pyskyq/examples/cli_epg.py", line 49, in main
nursery.start_soon(all_72_hour.fetch)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/trio/_core/_run.py", line 397, in __aexit__
raise combined_error_from_nursery
File "/Users/brad/Code/pyskyq/src/pyskyq/xmltvlisting.py", line 189, in fetch
async for bytechunk in resp.body:
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 366, in step
return await ANextIter(self._it, start_fn, *args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 197, in __next__
return self._invoke(first_fn, *first_args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 209, in _invoke
result = fn(*args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/asks/response_objects.py", line 130, in __aiter__
event.data = decompressor.send(event.data)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/asks/http_utils.py", line 36, in decompress
data = _compression_mapping[compression](data)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 532, in decompress
return f.read()
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 276, in read
return self._buffer.read(size)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
Any further thoughts?
I could certainly write some code to uncompress this myself, if needed via the callback mechanism, but if stream=True
is meant to decode on the fly, then this is a bug, no?
Cheers
Brad
Hm. Might be a bug. Can you show the output of print(resp.headers)
?
Headers on line 2 of below dump.
[2018-11-01 21:05:27,016] DEBUG:pyskyq.xmltvlisting:Fetch(<XMLTVListing: url='http://www.xmltv.co.uk/feed/6715', path='.epg_data', filename='42a4b30993795c4efc92cdc93d5c10d5e5968baa255a8d85d8cee691b7319cbf.xml'>) call started.
{'server': 'nginx/1.11.10', 'date': 'Thu, 01 Nov 2018 21:05:27 GMT', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'last-modified': 'Thu, 01 Nov 2018 02:35:57 GMT', 'etag': '"112fe98-57991474d8431-gzip"', 'accept-ranges': 'bytes', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip'}
Traceback (most recent call last):
File "src/pyskyq/examples/cli_epg.py", line 70, in <module>
trio.run(main, sys.argv[1:])
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/trio/_core/_run.py", line 1337, in run
raise runner.main_task_outcome.error
File "src/pyskyq/examples/cli_epg.py", line 49, in main
nursery.start_soon(all_72_hour.fetch)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/trio/_core/_run.py", line 397, in __aexit__
raise combined_error_from_nursery
File "/Users/brad/Code/pyskyq/src/pyskyq/xmltvlisting.py", line 219, in fetch
async for bytechunk in resp.body:
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 366, in step
return await ANextIter(self._it, start_fn, *args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 197, in __next__
return self._invoke(first_fn, *first_args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/async_generator/_impl.py", line 209, in _invoke
result = fn(*args)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/asks/response_objects.py", line 130, in __aiter__
event.data = decompressor.send(event.data)
File "/Users/brad/.virtualenvs/pyskyq-4vSEKDfZ/lib/python3.7/site-packages/asks/http_utils.py", line 36, in decompress
data = _compression_mapping[compression](data)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 532, in decompress
return f.read()
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 276, in read
return self._buffer.read(size)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
I did this, it's ugly but it works: https://gitlab.com/bradwood/pyskyq/blob/master/src/pyskyq/xmltvlisting.py#L189
Nicely caught. I've opened a new issue for this.
Hi @theelous3
I suspect I'm getting output from this that is still compressed... Is this a bug or a feature?
My code (copied from your docs, pretty much verbatim)