Open nestoru opened 7 years ago
HTML support would be fantastic :-)
Hi. HTML support can be easily added by adding the line:
'1013': 'htmlBody',
below line 78 in the msg.reader.js file
getFileData() will then also return htmlBody. To convert to String use:
new TextDecoder("utf-8").decode(fileData.htmlBody);
Maybe this can be added to the code?
'1013': 'htmlBody'
I tried it already @0sander, and this is not working as expected in my tests. I thought it could be that simple, but it's not... :(
(I tried with HTML emails coming from Outlook)
I did some more tests and you are right: It only works in some cases - sometimes the htmlBody Array only contains the first 64 elements / characters, while in other cases it contains the whole HTML message.
I have tried with Outlook 2010.
@0sander See PR #7 for a fix affecting HTML embedded as binary.
is this fixed now that #7 is included ?
my body html is appearing like - bodyHTML: "���9buT"
any ideas?
@visgotti same here. bodyHTML seems to not parse correctly. In my case it only reads 16 bytes from the corresponding block. The same would happen if #7 wasn't applied so that is not it.
maybe @ykarpovich has some idea where that goes wrong ?
Example mail with some attachments: https://drive.google.com/file/d/1Qt1I0w1TTP6H-Z4ZrGnbWcBoehEgEHtS/view?usp=sharing
...
bodyHTML: "yz�buT.��( B"
...
Try changing this line https://github.com/ykarpovich/msg.reader/blob/master/msg.reader.js#L396 From:
if (fieldName) {
To:
if (fieldName && !fields[fieldName]) {
There's something weird with the full HTML being extracted and then getting overwritten as the parser continues traversing through the .msg file's data structure.
@ashsearle I did try that just now. No change. The change only triggers on the body field anyway not on the bodyHtml field.
At this moment the issue still exists. In some cases .msg
contains valid body as HTML, but it's really rare.
Looks like HTML body message stored as RTF body attachment (or something like that).
Any way need to investigate how to retrieve HTML body in 100% cases. Based on the specification it doesn't work for now
@ykarpovich sounds likely it is RTF (Outlook should use HTML as default but maybe they convert it internally depending on some use cases). will try to read the pure data as RTF
PS: I see there are some JS converts already https://github.com/iarna/rtf-to-html PPS: Sometimes Outlook uses a compressed RTF body format which might complicate things, but that should be stored in another property (https://docs.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagrtfcompressed-canonical-property)
In some cases Outlook doesn't even parse its own format correctly...
Any news on this one and RTF ? This might help https://github.com/HiraokaHyperTools/msgreader
Excellent work I would say. Supporting msg format from JS is a must!
Supporting html is very important. Since I could not reopen #1 and I do not know if closed issues are monitored I am opening a new issue now. The demo page fails to parse the msg in full when html is included. Is it difficult to enhance the library to support that? Perhaps the author can share some ideas?
In the meantime here is a workaround:
- Use msgconvert linux command to go from msg to eml:
sudo apt install -y libemail-outlook-message-perl cd /tmp msgconvert test\ with\ html\ content.msg # creates test\ with\ html\ content.eml
- Use https://github.com/nodemailer/mailparser to get the information from the eml, for example:
git clone https://github.com/nodemailer/mailparser.git npm install cd mailparser/examples node extractInfoFromEml.js /tmp/test\ with\ html\ content.eml
- Below is the code for extractInfoFromEml.js
/* eslint no-console:0 */ 'use strict'; const util = require('util'); const fs = require('fs'); const simpleParser = require('../lib/simple-parser.js'); const args = process.argv.slice(2); const filePath = args[0]; let input = fs.createReadStream(filePath); simpleParser(input) .then(mail => { console.log(util.inspect(mail, false, 22)); }) .catch(err => { console.log(err); });
Best regards, - Nestor
how to use these commands in code. kindly help @ nestoru
At this moment the issue still exists. In some cases
.msg
contains valid body as HTML, but it's really rare. Looks like HTML body message stored as RTF body attachment (or something like that).Any way need to investigate how to retrieve HTML body in 100% cases. Based on the specification it doesn't work for now
In my case I am getting the file locally on server from a folder & it did't give me the html body, but if I parse it through the example given by you it shows the content over there...I have attached the image of the response after parsing on the server side & image of the example as well. what I am doing wrong kindly need a little help. lots of appericating in advance. thanks serverside example
how to get the image having "cid" link src from the html body. any idea?
Is there any way to open .msg file automatically with this html page ? Without having to browse for a file. Something like 'start firefox.exe http://localhost/example c:\file.msg' Or a javascript cmd to show it in a browser ?
Thank you
how to get the image having "cid" link src from the html body. any idea?
For that I added the following line to the NAME_MAPPING
section:
'3701': 'embeddedImage',
EDIT: Though what I just realize now that there is this section, but I dont know what it does:
CLASS_MAPPING: {
ATTACHMENT_DATA: '3701'
},
I can access embedded images like this: Next step is to replace the cid-urls by image blobs.
About RTF encoded HTML: I was curious so I looked at all extracted information that do not get a name and so are dropped.
I stumbled upon 1009
and called it bodyRTF
.
After fighting hours to get npm packages to work that are meant to run using node.js (I need a browser solution), I used the following code for stripping the RTF annotation:
https://stackoverflow.com/a/188877/3989858
I did have to modify it a little to make it work, and I still have to fiddle with embedded images, but it looks promising so far.
I could provide this as an angular project if someone is interested.
ide this as an angular project if someone is interest
@FROGGS , do you have a fork of this project with your changes/fixes?
Excellent work I would say. Supporting msg format from JS is a must!
Supporting html is very important. Since I could not reopen https://github.com/ykarpovich/msg.reader/issues/1 and I do not know if closed issues are monitored I am opening a new issue now. The demo page fails to parse the msg in full when html is included. Is it difficult to enhance the library to support that? Perhaps the author can share some ideas?
In the meantime here is a workaround:
Best regards, - Nestor