joniles / rtfparserkit

Primary repository for RTF Parser Kit library
Apache License 2.0
104 stars 42 forks source link

NullPointerException: charsetName #16

Closed petergaultney closed 7 years ago

petergaultney commented 7 years ago

rtfparserkit's StandardRtfParser apparently can't handle the RTF file I'm parsing, as (for reasons that probably make perfect sense to the maintainers) no character encoding ever gets set. So the first call to processCharacterBytes fails, since it calls currentEncoding, which returns a null string from the ParserState, which then causes the NPE upon construction of the new String.

I was able to "fix" it by adding a new default to the currentEncoding method which returns "UTF-8" when neither the currentEncoding nor the currentFontEncoding is non-null in the parser state. This doesn't seem like a very robust fix.

Are RTFs expected to always provide a Command that will declare the encoding? Meaning that the logic never should have to handle a case where no encoding is provided by the time the first processCharacterBytes call is made?

In any case, here's the stack trace.


Exception in thread "main" java.lang.NullPointerException: charsetName
    at java.lang.String.<init>(String.java:424)
    at com.rtfparserkit.parser.standard.StandardRtfParser.processCharacterBytes(StandardRtfParser.java:86)
    at com.rtfparserkit.parser.raw.RawRtfParser.handleCharacterData(RawRtfParser.java:292)
    at com.rtfparserkit.parser.raw.RawRtfParser.handleGroupEnd(RawRtfParser.java:324)
    at com.rtfparserkit.parser.raw.RawRtfParser.parse(RawRtfParser.java:81)
    at com.rtfparserkit.parser.standard.StandardRtfParser.parse(StandardRtfParser.java:50)
    at com.rtfparserkit.converter.text.AbstractTextConverter.convert(AbstractTextConverter.java:41)
    at com.rtfparserkit.converter.text.StringTextConverter.convert(StringTextConverter.java:34)```
joniles commented 7 years ago

Hi, could you provide me with a sample RTF file which triggers this issue? Thanks!

petergaultney commented 7 years ago

Sure. It's attached in a .zip (apparently github won't accept a raw RTF).

I suspect (but have not confirmed) that the issue is with the very first block. There is no \f command to set the font/encoding, and when it tries to parse the "Normal;", it considers that to be character data and tries to parse it as such.

rtf.zip

joniles commented 7 years ago

I had a quick look at the RTF spec:

After specifying the RTF version, you must declare the default character set used in the document unless it is \ansi (the default).

I wasn't handling the default case correctly. I've just committed a change which fixes this. I'll add a new release jar shortly. Thanks for reporting the bug!

petergaultney commented 7 years ago

awesome - thanks for the quick response!

joniles commented 7 years ago

Looks like this is fixed- just didn't get around to closing it!