markviews / GmailWebhook

Sends new emails to a discord webhook
MIT No Attribution
1 stars 0 forks source link

Some email bodies returning as raw html #2

Open FunkedelicBob opened 2 years ago

FunkedelicBob commented 2 years ago

Do you have any idea why certain emails return as raw HTML and not the properly formatted text? You can see in these images what I'm referencing: imgur

It's weird because they are from the same source, structured the same way.

Super appreciate your help btw!

markviews commented 2 years ago

hmm this is strange :/ I'm not home to test right now, but I think this should work:

Install jsdom with npm i jsdom

Add these lines to the top of the file

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

Add this line above line 105

body = new JSDOM(body).innerText

Source

markviews commented 2 years ago

i'm unsure if this will work. might need to be in a try, catch or something in case the body element is already plain text and not valid html. I can test this when I get home in a few days, I don't think node.js will run on my chromebook 😅

FunkedelicBob commented 2 years ago

i'm unsure if this will work. might need to be in a try, catch or something in case the body element is already plain text and not valid html. I can test this when I get home in a few days, I don't think node.js will run on my chromebook 😅

Thanks for the response! Yeah it does seem to cause an issue. But all good on whenever you can have a look - as you can tell I'm a bit of a noob, not quite sure what's wrong:

TypeError: Cannot read properties of undefined (reading 'substring')

    //trim messages to not go over discord's message limit
    body = new JSDOM(body).innerText;
    from = from.substring(0,50);
    subject = subject.substring(0,100);
    //body = body.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '[Link]');
    body = body.substring(0,600);
markviews commented 2 years ago

hm. maybe this instead

body = new JSDOM(body).window.document.body.textContent
FunkedelicBob commented 2 years ago

That did it!

It did create some issues with images being removed and having a ton of line breaks. But I used some more regex to simply replace multiple line breaks with one.

body = body.replace(/\n\s*\n\s*\n/g, '\n\n');

Source