Open nancykapur opened 6 years ago
Here is the code:
`def decompress_without_eoi(buf): def gen(): try: for byte in lzw.decompress(buf): yield byte except ValueError as exc:
if 'End of information code' in repr(exc):
pass
else:
raise
return
try:
deblob = b''.join(gen())
except Exception as exc2:
deblob=[]
try:
for byte in gen():
deblob.append(byte)
except Exception as exc3:
return b''.join(deblob)
return deblob
def deblob3(row): if pd.notnull(row[0]): blob = row[0]
h = html2text.HTML2Text()
h.ignore_links=True
h.ignore_images = True #zzzz
if type(blob) != bytes:
blobbytes = blob.read()[:-10]
else:
blobbytes = blob[:-10]
if row[1]==361:
# If compressed, return up to EOI-257 code, which is last non-null code before tag
# print (row[0])
return h.handle(striprtf(decompress_without_eoi(blobbytes)))
elif row[1]==360:
# If uncompressed, return up to tag
return h.handle(striprtf(blobbytes))`
I know this is 4 years old, but i ran into the same issue and created a different package that fixes this error
Hello,
We are using this package to decompress blob files out of a database.
For the text field that are longer, the end of the field is corrupted and unread-able.
Could this be an issue with the end of file reader or the byte limit? And if so, can you recommend a work-around?