use `unsigned char` for char classification

diegoiast commented 2 years ago

rfc822 says that email should be ASCII/Latin1 - but in reality, I see from gmail cp1255 - andprobably other 8ibt encodings. Which are compatible with latin1... so, this happens on the field. I am unsure if this is the best way to do this - I could not find a way get a uchar from and std::string.

The C standard does not define how isalpha() behaved when we pass it a negative number. It deals with ASCII only. GLIBC tries to handle this by testing it as the current locale, which is... not something the standard demands. MSVC is more strict - it just throws.

So - all these functions need to have a uch value - ugly, and simple solution.

Some RTFM: https://news.ycombinator.com/item?id=28703525 https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html

diegoiast commented 2 years ago

(ignoring the conflict)

Is this PR still valid? I fixed some crashes on my side.

karastojko commented 2 years ago

Sorry for the late reply. The idea of the latest commits is to be encoding agnostic (by storing the string received over socket and it's encoding) and not to assume ASCII or UTF8. Let me try your PR with the internal tests and how it fits to the current state of the code. The topic is not trivial, especially when different platforms are considered.

karastojko commented 1 year ago

Considering char8_t and u8string and also not having more similar reports of failures, I will skip merging this PR.

karastojko / mailio

use `unsigned char` for char classification #91