Closed khennes closed 4 years ago
@khennes Does that mean we can remove the 2nd part of the preprocess
function altogether?
And to confirm, OnDateSomebodyWroteRegexp
is still being applied as part of the splitter regexes?
@khennes Does that mean we can remove the 2nd part of the
preprocess
function altogether? And to confirm,OnDateSomebodyWroteRegexp
is still being applied as part of the splitter regexes?
Not quite, because preprocess
is still called as part of extractFromPlain
and we pass text/plain
as the content type in that case. And yep, the regex is still applied as part of the splitter regexes.
Ah I see. The Github search was failing me... Yeah that sounds good.
This change ensures that we pass the correct content-type to
preprocess
when extracting a reply from HTML.When preprocessing a plain text document, we search for the
OnDateSomebodyWroteRegexp
anywhere in the message body instead of matching it only on the beginning of a line. This means that it's easier to find false positives in the reply content - any sentence that matches the pattern "On ..., ... wrote/sent ..." If processing an HTML doc, we can afford to be a bit stricter, and only match that regexp on the beginning of a line. (Incidentally, this is equivalent to what mailgun/talon does.)As a consequence of this change, however, a Nylas test started failing on the fixture
email_15.html
. This is because we previously expected to find a splitter in the middle of a line comprised of twoblockquote
tags:#!%!12!%!# #!%!13!%!##!%!16!%!# Some text in an inline quote#!%!14!%!##!%!15!%!# On Jan 1 2020, at 12:34 pm, user@example.com <user@example.com> wrote: #!%!17!%!##!%!222!%!# #!%!18!%!#
.Now that we only match the
OnDateSomebodyWroteRegexp
on the start of a line, that's no longer the case.Instead, this PR adds
<blockquote>
to the list of tags that we automatically append a newline char to when converting an XML document to text. The same line is then split into two:This fixes the failing test.
Alternatives considered
OnDateSomebodyWroteRegexp
is fairly loosely defined - we could instead/also work on making it stricter.