Longer text files corrupted

nancykapur commented 6 years ago

Hello,

We are using this package to decompress blob files out of a database.

For the text field that are longer, the end of the field is corrupted and unread-able.

Could this be an issue with the end of file reader or the byte limit? And if so, can you recommend a work-around?

nancykapur commented 6 years ago

Here is the code:

`def decompress_without_eoi(buf): def gen(): try: for byte in lzw.decompress(buf): yield byte except ValueError as exc:

        if 'End of information code' in repr(exc):

            pass
        else:
            raise
        return
try:

    deblob = b''.join(gen())
except Exception as exc2:

    deblob=[]

    try:
        for byte in gen():
            deblob.append(byte)
    except Exception as exc3:

        return b''.join(deblob)

return deblob

def deblob3(row): if pd.notnull(row[0]): blob = row[0]

    h = html2text.HTML2Text()
    h.ignore_links=True
    h.ignore_images = True #zzzz

    if type(blob) != bytes:
        blobbytes = blob.read()[:-10]
    else:
        blobbytes = blob[:-10]

    if row[1]==361:
        # If compressed, return up to EOI-257 code, which is last non-null code before tag
   #     print (row[0])
        return h.handle(striprtf(decompress_without_eoi(blobbytes)))
    elif row[1]==360:
        # If uncompressed, return up to tag
        return h.handle(striprtf(blobbytes))`

plessbd commented 2 years ago

I know this is 4 years old, but i ran into the same issue and created a different package that fixes this error

https://github.com/plessbd/ocflzw-decompress

joeatwork / python-lzw

Longer text files corrupted #3