Zeros embedded in subject and issuer fields when using certificates generated with makecert (e.g. self-signed for test-signing during driver development)

hugmyndakassi commented 5 months ago

It appears as if makecert writes Windows-style wchar_t* into these fields verbatim instead of encoding them as, say, UTF-8. This is an issue as it will lead to the strings being interpreted as single-character string, given every other character is a zero byte.

I worked around this locally by creating a function that filters these:

static inline std::string filter_escaped_zero_bytes(char const* instr)
{
    auto input = std::string(instr);
    static std::string const needle = "\\x00";
    std::ostringstream os;
    std::size_t currpos = 0, oldpos{};
    do
    {
        oldpos = currpos;
        currpos = input.find(needle, currpos);
        if (std::string::npos == currpos)
        {
            break;
        }
        os << input.substr(oldpos, currpos - oldpos);
        currpos += needle.size();
    } while (true);
    os << input.substr(oldpos); // remainder
    return os.str();
}

... and using that in the ctor (Certificate::Certificate):

subject_ = filter_escaped_zero_bytes(subject.get());
issuer_ = filter_escaped_zero_bytes(issuer.get());

Not sure this is an issue you'll want to address at all, since it appears that MS and its makecert tool deviate from the standard here. Just wanted to bring this to your attention as well as to the attention of those watching the project.

Feel free to close this pretty much immediately.

PS: this isn't particularly optimized, because I only ever encountered this with those certificates created by makecert and we only use these in testing scenarios. But I'm sure there is still some room for improvement.

woodruffw commented 5 months ago

Thanks for the report! Yeah, I'm not sure if there's a great option for us here:

The main relevant standard is probably RFC 5280, which says that individual RDN attribute values have to be either PrintableString or UTF8String. So mkcert is just plain wrong to use UTF-16 or UCS-2 or whatever it's doing. OTOH, they may be following RFC 3280, which is much looser about value type encodings.
The subject and issuer are both exposed as std::string intentionally, since consumers should treat them as a (ptr, len) pair that may have all kinds of weird bytes internally (such as embedded NULLs).

So, two ideas I have:

Tweak the documentation slightly to make it clear that get_subject() and get_issuer() may return all kinds of nonsense, including NULL bytes.
Change get_subject() and get_issuer() to return std::vector<std::byte> instead, which will emphasize that they aren't necessarily a naive printable encoding. This will be a breaking change, but IMO an acceptable one.

Thoughts?

woodruffw commented 5 months ago

The subject and issuer are both exposed as std::string intentionally, since consumers should treat them as a (ptr, len) pair that may have all kinds of weird bytes internally (such as embedded NULLs).

Ah, never mind -- I forgot that we call X509_NAME_oneline which boils it down to char *, causing all this pain. I think we need to replace that call with X509_NAME_print_ex and pass in ASN1_STRFLGS_UTF8_CONVERT, which will perform the UTF-8 conversion.

trailofbits / uthenticode

Zeros embedded in subject and issuer fields when using certificates generated with makecert (e.g. self-signed for test-signing during driver development) #95