Open toberndo opened 9 years ago
I think there's a gap in the docs about it – you should either use Uint8Array|ArrayBuffer or pseudo-binary string values for 8-bit values, not unicode strings. For example, you can use TextEncoder API to convert an unicode string to an Uint8Array:
var buf = new TextEncoder('utf-8').encode('õäöü');
// [0xC3, 0xB5, 0xC3, 0xA4, 0xC3, 0xB6, 0xC3, 0xBC]
which is something we might as well fix in here? just put everything into a uint8array internally? wouldnt make a difference then for thomas...
The pseudobinary input comes from browserbox, probably doesn't make sense to convert all output from strings to typed arrays and then back again when parsing.
fair enough On Mar 5, 2015 9:13 AM, "Andris Reinman" notifications@github.com wrote:
The pseudobinary input comes from browserbox, probably doesn't make sense to convert all output from strings to typed arrays and then back again when parsing.
— Reply to this email directly or view it on GitHub https://github.com/whiteout-io/mailreader/issues/12#issuecomment-77323123 .
Incoming data from TCPSocket to BrowserBox is an ArrayBuffer. BrowserBox converts this to pseudo-binary (can't use ascii (might include 8-bit data) or utf-8 (might be something else than utf-8)), does its stuff and passes it on as is. MimeParser on the other end receives the pseudo-binary stuff, detects the correct charset and outputs valid unicode strings. So all string data between TCPSocket input and MimeParser output (which also includes mailreader objects) is in pseudo-binary format by default to minimize conversions from one type to another.
Just to be clear, pseudo-binary is what you get with this:
var str = unescape(encodeURIComponent('õäöü'));
// "õäöü"
it looks like a 8-bit string while actually it is an unicode string, that only uses the first 256 code points.
Thanks for the quick response. I tried unescape(encodeURIComponent('õäöü'))
with my test content, and yes that would lead to the correct result.
A little background on how we are currently using mailreader: the idea is to simply throw the output of https://github.com/openpgpjs/openpgpjs/blob/master/src/openpgp.js#L139 at mailreader.parse
and get the MIME nodes as a result.
I did some more testing and the following works fine:
rawText = unescape(encodeURIComponent(rawText));
that.mailreader.parse([{raw: rawText}], function(parsed) {
Does mailreader support Content-Transfer-Encoding:8bit?
I'm getting reports (https://github.com/mailvelope/mailvelope/issues/6#issuecomment-74582621) about wrong decoding of umlauts if transfer encoding 8bit was used.
Not sure if this is the right test setup, but I created a rawText:
and after
mailreader.parse([{raw: rawText}], function(parsed) {
the result of textParts[0].content with mailreader v0.4.2 is: