Open markov2 opened 1 year ago
Question: what makes your charset encoding/decoding so special that Perl cannot handle it? I have totally no idea what you can do more.
We have routines to detect the character set encoding of a message part body, and do the decoding. It has a number of heuristics and rules to make sense of commonly mislabeled cases, and fallback options for when decoding fails. These problems are surprisingly common in email. To give you a few examples:
So we don't rely on Mail-Message to do the charset decoding. We just ask it to decode the content-transfer-encoding and then we decode the charset ourselves. It's all Perl code, but with lots of additional logic.
Then for writing the output, it's a similar story. We encode the charset ourselves, and ask Mail-Message to do the content-transfer-encoding when we create a Mail::Message::Body object, but not the charset encoding. Again, it's Perl code, but with additional logic for necessarily clean-up, e.g., removing characters that can't be encoded or selecting a different encoding, selecting the content-transfer-encoding, or adding byte order marks where necessary. For encoding, it's not so much that Mail-Message can't do the charset encoding for us, but rather that we built our code around a Mail-Message 2.x that gave us the option of doing our own charset encoding.
(Was issue #8 question 3)
In Mail-Message 2.x, you could call encode() with a
transfer_encoding
option but nocharset
option as a way of decoding or encoding the content transfer encoding, while not decoding or encoding the character set encoding. This functionality didn’t work in Mail-Message 3.x. Our code frequently makes use of this when we want to do our own character set decoding or encoding, outside what Mail-Message provides.