kisli / vmime

VMime Mail Library
http://www.vmime.org
GNU General Public License v3.0
274 stars 110 forks source link

vmime: prevent loss of a space during text::createFromString #306

Closed jengelh closed 6 months ago

jengelh commented 6 months ago
mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate();

produces

=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com>

The first space between Test and München is encoded as an underscore along with the first word: Test_. The second space between München and West is encoded with neither of the two words and thus lost. Decoding the text results in Test MünchenWest instead of Test München West.

This is caused by how vmime::text::createFromString() handles transitions between 7-bit and 8-bit words: If an 8-bit word follows a 7-bit word, a space is appended to the previous word. The opposite case of a 7-bit word following an 8-bit word misses this behaviour.

When one fixes this problem, a follow-up issue appears:

text::createFromString("a b\xFFc d") tokenizes the input into m_words={word("a "), word("b\xFFc ", utf8), word("d")}. This "right-side alignment" nature of the whitespace is a problem for word::generate():

As per RFC 2047, spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed.

When word::generate() outputs the b-word, it would have to strip one space, but only when there is a transition from encoded-word to unencoded word. word::generate() does not know whether d will be encoded or unencoded.

The idea now is that we could change the tokenization of text::createFromString such that whitespace is at the start of words rather than at the end. With that, word::generate() need not know anything about the next word, but rather only the previous one.

Thus, in this patch,

  1. The tokenization of text::createFromString is changed to left-align spaces and the function is fixed to account for the missing space on transition.
  2. word::generate learns how to steal a space character.
  3. Testcases are adjusted to account for the shifted position of the space.

Fixes: #283, #284

Cc @RichardSteele