quentez / talonjs

JavaScript port of the Talon email quote parsing library.
MIT License
15 stars 9 forks source link

PB 17319 — Don't crop middle of message #28

Closed cunisia closed 4 years ago

cunisia commented 4 years ago

JIRA

⛳ State the problem being solved.

In the following message, the lines starting by > were identified as quotations and were mistakenly cropped from the message's body. So they are hidden by default, unless the user clicks on the ... button.

Kapture 2019-12-12 at 16 32 44

Here is the underlying reason behind this bug:

To find quotation, we first use markMessageLines to identify the nature of its of each line. It returns a string where each char represents the nature of the line at its index in the message using the following code:

Example of output: tesemmet.

Then in processMarkedLines, we look for quotations pattern in this string using (among others) this regexp: ((?:s|(?:me*){2,}).*me*)[te]*$. It looks for:

This means that indeed, a group of quotations line in the middle of a message (ie: preceded and followed by new text), will be identified as a quotation and will be cropped from the message body.

💭 Explain your solution and thought process.

Although it sounded reasonable to consider everything following a splitter like a quotation, it sounded a bit violent when it came to what follows a bunch of quotes line.

In order to fix that I split this regexp into two regexp:

EmptyQuotationRegexp was suffering from the same issue and I split it into two regexps following the same logic.

I added a test for the situation that caused the bug and all previously existing tests are still passing.

🌟 Highlight anything that is likely to raise questions, or that you want reviewers to pay special attention to.

This solution is not perfect as it will fail in the following case:

Hello,
I like to add quotes in the middle of my message:
> this is 
> a quote on two lines

This is a very important sentence in my message.
> and I like to finish my message with a quote. 

But we have a specific test case for this situation that specifically expects this pattern to be identified as a quote: should find quote with multi-line link 2. So I'm tempted to leave it as it is for now considering that this a rare use case (don't know if we can rely on such gut feeling when it comes to mailing though).

evonck commented 4 years ago

@cunisia Looks good to me. Why we made that choice, it was probably set up that way in talon.
This is simply a ts version of https://github.com/mailgun/talon. As there is no one way of setting quote we usually improve this project when we find issues in front. But we need to wait for @quentez review

cunisia commented 4 years ago

thanks @evonck ! @quentez the bug stroke again, do you think you could review this PR anytime soon?