joshy / striprtf

Stripping rtf to plain old text
http://striprtf.dev
BSD 3-Clause "New" or "Revised" License
94 stars 27 forks source link

NoneType error #6

Closed janeshrapnel closed 4 years ago

janeshrapnel commented 4 years ago

I am getting a TypeError in the code and its breaking at at line 116: c = int(arg)

I am getting the following error: TypeError: ("int() argument must be a string, a bytes-like object or a number, not 'NoneType'", 'occurred at index 165970')

The code used is below and is used to decompress BLOBs in a database. It works fine when compress is 361 but not 360.

This is the code that is running:

def `deblob3(row):`
    if pd.notnull(row[0]):
        blob = row[0]
        h = html2text.HTML2Text()
        h.ignore_links=True
        if type(blob) != bytes:
            blobbytes = blob.read()[:-10]
        else:
            blobbytes = blob[:-10]
        if row[1]==361:            
            return h.handle(striprtf(decompress_without_eoi(blobbytes)))
        elif row[1]==360: 
            return h.handle(striprtf(blobbytes))
joshy commented 4 years ago

Hi, thanks for the report. Could you maybe provide the string or better the rtf file (the content of decompress_without_eoi(blobbytes))?

janeshrapnel commented 4 years ago

Thanks Joshy. It is actually the 360 compression that isn't working. I am also aware that it may not be this program that is causing the error, this just happens to be where it is coming up with the error.

For reference 360 is for uncompressed blobs and 361 is compressed blobs.

So the file looks like the attached file. I haven't put in the full IS_BLOB but it's a byte format Example_File.txt

Robabrown commented 4 years ago

Hello I'm getting this exact same error, on line 126 its throwing "argument must be a string type... not "noneType. Here's the copy of the rtf that's causing the error cf1\fs22\par\plain\f1\fs22\u*

-- It blows up when it gets to "u"

joshy commented 4 years ago

Hi Robabrown, can you sent me the full rtf file? Is \u* even valid?

Robabrown commented 4 years ago

here is the complete line of rtf that causes the error: 'r\plain\f1\fs22 - continue home citric acid/sodium citrate, folic acid, B vitamin complex\plain\f2\fs22\lang1033\hich\f2\dbch\f2\loch\f2\cf1\fs22\par\plain\f1\fs22 - Strict I/Os\plain\f2\fs22\lang1033\hich\f2\dbch\f2\loch\f2\cf1\fs22\par\plain\f1\fs22\u*'

joshy commented 4 years ago

Hmm, I guess the star in \u* is the problem. This is not covered in the RTF Spec. I will see what I can do to fix it.

joshy commented 4 years ago

Fixed with https://github.com/joshy/striprtf/releases/tag/v0.0.9