mazira / rtf-stream-parser

Contains native Node classes for transforming an RTF byte stream into tokens, and de-encapsulating HTML
MIT License
23 stars 4 forks source link

Issue with links in rtf #1

Closed TheElementalOfDestruction closed 6 years ago

TheElementalOfDestruction commented 6 years ago

I have a line in my file that goes as follows (with sensitive information replaced): {\*\htmltag84 <a href="mailto:address@emailhost.net">}\htmlrtf {\field{\*\fldinst{HYPERLINK "mailto:address@emailhost.net"}}{\fldrslt\cf1\ul \htmlrtf0 address@emailhost.net\htmlrtf }\htmlrtf0 \htmlrtf }\htmlrtf0 {\*\htmltag92 </a>}

The extracted html SHOULD be <a href="mailto:address@emailhost.net">address@emailhost.net</a>

but instead it comes out as

<a href="mailto:address@emailhost.net"></a>

therefore making it so that the link is not attached to anything. This happens every place that there is text that is supposed to have a link on it, regardless of the other formatting applied to the text. The formatting DOES stay, however, but the text does not.

TheElementalOfDestruction commented 6 years ago

I fixed it myself. To fix it, locate the file called "de-encapsulate.js" and change the function _doText from

_doText(data) {
  const state = this._state;
  const inside = state.destination === 'htmltag';
  const outside = state.destination === 'rtf' && !state.htmlrtf;
  // Skip if not inside html tag or directly to rtf
  if (!inside && !outside)
    return;

  if (typeof data === 'string') {
    this.push(data);
  } else {
    const cpg = inside ? this._cpg : this._getFontCpg();
    this.push(iconv.decode(data, cpg));
  }
}

to

_doText(data) {
  const state = this._state;
  const inside = state.destination === 'htmltag';
  const outside = state.destination === 'rtf' && !state.htmlrtf;
  const fldrslt = state.destination === 'fldrslt';
  // Skip if not inside html tag or directly to rtf
  if (!inside && !outside && !fldrslt)
    return;

  if (typeof data === 'string') {
    this.push(data);
  } else {
    const cpg = inside ? this._cpg : this._getFontCpg();
    this.push(iconv.decode(data, cpg));
  }
}

If anyone encounters any errors with this new version, please comment on this thread so I can correct it. Thank you.

Ps: is dev completely gone from this?

rossj commented 6 years ago

Hi, thank you for the update! I will take a look and push a new version.

We are still maintaining and using this library. Sorry, I just missed the initial issue notification.

TheElementalOfDestruction commented 6 years ago

No worries. Although because I thought you might have left the project I did spend a few weeks on determining the location on the problem and building a fix :P But no worries. It was a good learning experience. I learned a lot more about JavaScript in the process.

rossj commented 6 years ago

Sorry the delay here, but I've finally fixed the issue and published a new version. In addition, I've also improved the handling of decoding text from different multi-byte code pages, along with adding the ability to de-encapsulate plain text (RTF's `\fromtext' feature) in addition to HTML.