Closed jrideout closed 5 years ago
What does this mean exactly? Same as tnefparse --htmlbody
but for RTF bodies? I seem to remember from a long time ago that RTF is indeed embedded/wrapped in some funky way in TNEF...
Exactly, we'll want to support tnefparse --rtfbody
The one thing I'm not certain about is if we need to decompress the rtf, or if the rtf data is valid even when compressed. https://github.com/delimitry/compressed_rtf seems to do what we need to just decompress the data without fully parsing it.
Nice! That's a small dependency well worth it I'd think.
This does what I want for RTF parsing: https://gist.github.com/gilsondev/7c1d2d753ddb522e7bc22511cfb08676
I'd rather not add a dependency for a full rtf parser. Should we include this file in our source, or just leave it outside the scope of the project?
Hm. I consider document format conversions to be outside the scope, but some limited use cases might fall on the borderline. If I may ask, what's the goal - just support extraction of plaintext words for indexing, or something else?
what's the goal - just support extraction of plaintext words for indexing
just that
I consider document format conversions to be outside the scope
I agree. Let's stop here. Users can do their own RTF parsing if desired.
I gave this some more thought. Conversions of tnef body content in general are out of scope of tnefparse.
But I am pretty sure I remember the RTF/HTML bodies are to some extent specific to the MS TNEF implementations, with some quirks and deviations. That makes me think extraction of plaintext is something that's within the scope here.
So that can be revisited.
Parse the rtf in some way, perhaps via an optional dependency.