Closed viernullvier closed 4 years ago
This was actually a design choice : if there is a fileno, it is assumed that it could be used directly instead of calling into Python for each chunk to read.
However, it seems that gzip file objects also contain a fileno, which is not how I understand the docs. I could add an option to disable this optimization, but I would prefer a canonical way of determining if it is safe to use the fileno.
A workaround that does not require reading the entire file to a string would be to wrap the file object in an object that does not expose the fileno()
method, e.g.
class Wrapper(object):
def __init__(self, f):
self.__f = f
def read(self, *n):
return self.__f.read(*n)
for obj in splitfile(Wrapper(f), "json"):
print obj
Fixed in PR #10 now that I finally migrated to GitHub actions
The splitstream module somehow doesn't work with gzip file streams. I've been unable to trace the reason for this issue since no exception is raised, it just doesn't work at all.
Workaround: Wrapping the entire file into a StringIO stream - BufferedReader doesn't work either, it seems to pass the unprocessed gzip data to splitstream.
Tested with Python 2.7.10 on OS X