ykaliuta / fidogate

FidoGate
GNU General Public License v2.0
12 stars 6 forks source link

ftn2rfc RFC2047 compliance with UTF-8 and splitted headers encoded to base64 #8

Closed evs38 closed 4 years ago

evs38 commented 4 years ago

When ftn2rfc is gating from cyrillic CP866 charset into UTF-8 encoded with base64 in cases of long strings the headers are splitting into several lines. It is ok, but Fidogare often splits these lines in the wrong place: in the middle of the UTF8 multibyte sequence, which means that the next line not begin with an UTF sequence when decoding from base64. This causes some email clients and newsreaders to cut the Headers like Subject or Organization when decoding from base64. It would be necessary to fix this algorithm in accordance with RFC2047 and not break header lines in the middle of the letters. RFC 2047 specifically forbids this:

Each 'encoded-word' MUST encode an integral number of octets. The 'encoded-text' in each 'encoded-word' must be well-formed according to the encoding specified; the 'encoded-text' may not be continued in the next 'encoded-word'. (For example, "=?charset?Q?=?= =?charset?Q?AB?=" would be illegal, because the two hex digits "AB" must follow the "=" in the same 'encoded-word'.)

Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded- word's.

Examples:

photo_2020-02-13_13-51-50 `