lolo101 / MsgViewer

MsgViewer is email-viewer utility for .msg e-mail messages, implemented in pure Java. MsgViewer works on Windows/Linux/Mac Platforms. Also provides a java api to read mail messges (msg files) programmatically.
The Unlicense
174 stars 22 forks source link

msg2eml: lists are not correctly converted in the html version #126

Closed datsteves closed 2 years ago

datsteves commented 2 years ago

while testing the other issue, I found that msgs with lists are not converted correctly in the HTML version. Either the whole list item content is empty or just portions of it.

the text version is in the eml like this

Hello,

this is an email with formatted text

1.  sadasdasda
2.  sdasadas sd
3.  asdasdas

Another normal line

*   dasdasdasdasd
*   asdasd sdadsa
*   sdsad asdasda

the HTML is just

<div dir="ltr">
  Hello,
  <div><br /></div>
  <div>this is an email with formatted text</div>
  <div>
    <ol>
      <li></li>
      <li></li>
      <li></li>
    </ol>
    <div>Another normal line</div>
    <div>
      <ul>
        <li></li>
        <li></li>
        <li></li>
      </ul>
    </div>
    <div><br /></div>
    -- <br />
    <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
      <div>Regards</div>
    </div>
  </div>
</div>

Online Conversion Tool

To make sure that it wasn't an issue with the msg, I tested it with the first online conversion tool I could find for "msg to eml" (zamzar) and there the HTML looked like this when decoded from base64.

<div dir="ltr">
  Hello,
  <div><br /></div>
  <div>this is an email with formatted text</div>
  <div>
    <ol>
      <li>sadasdasda</li>
      <li>sdasadas sd</li>
      <li>asdasdas</li>
    </ol>
    <div>Another normal line</div>
    <div>
      <ul>
        <li>dasdasdasdasd</li>
        <li>asdasd sdadsa</li>
        <li>sdsad asdasda</li>
      </ul>
    </div>
    <div><br /></div>
    -- <br />
    <div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">
      <div>Regards</div>
    </div>
  </div>
</div>

Files

list-test.msg.zip list-test-zamzar.eml.zip

lolo101 commented 2 years ago

I suspect the RTF grammar in MsgViewer is incomplete. Maybe there are better alternatives than reinventing a RTF parser... I'll look for existing RTF parsing library

lolo101 commented 2 years ago

The current implementation of the [MS-OXRTFEX] specification is far from perfect I believe the issue is a mismanagement of the \htmlrtf switch

lolo101 commented 2 years ago

I know it would probably better to look for a RTF parsing library but this is a great opportunity to learn about RTF :yum:

@datsteves please let me know if this fix works for you

ThomasChr commented 2 years ago

I like it :-)