Closed petergaultney closed 7 years ago
Hi, could you provide me with a sample RTF file which triggers this issue? Thanks!
Sure. It's attached in a .zip (apparently github won't accept a raw RTF).
I suspect (but have not confirmed) that the issue is with the very first block. There is no \f command to set the font/encoding, and when it tries to parse the "Normal;", it considers that to be character data and tries to parse it as such.
I had a quick look at the RTF spec:
After specifying the RTF version, you must declare the default character set used in the document unless it is \ansi (the default).
I wasn't handling the default case correctly. I've just committed a change which fixes this. I'll add a new release jar shortly. Thanks for reporting the bug!
awesome - thanks for the quick response!
Looks like this is fixed- just didn't get around to closing it!
rtfparserkit's StandardRtfParser apparently can't handle the RTF file I'm parsing, as (for reasons that probably make perfect sense to the maintainers) no character encoding ever gets set. So the first call to processCharacterBytes fails, since it calls currentEncoding, which returns a null string from the ParserState, which then causes the NPE upon construction of the new String.
I was able to "fix" it by adding a new default to the currentEncoding method which returns "UTF-8" when neither the currentEncoding nor the currentFontEncoding is non-null in the parser state. This doesn't seem like a very robust fix.
Are RTFs expected to always provide a Command that will declare the encoding? Meaning that the logic never should have to handle a case where no encoding is provided by the time the first processCharacterBytes call is made?
In any case, here's the stack trace.