pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.35k stars 640 forks source link

Artifact fails to download with `invalid certificate: BadDER` #12000

Open cjntaylor opened 3 years ago

cjntaylor commented 3 years ago

My company uses SSL inspection as part of its security envelope. This system requires the equivalent of a self-signed intermediate certificate to be added to the system certificate store as a root certificate authority in order for any SSL based connection to function. Any SSL based connection will be intercepted and rewritten; the new connection will be signed by this CA instead of its original authority and packets will be rewritten to match.

I've been unable to use pants in this context, which, when attempting to download it's artifacts, reports the following error:

Exception: Error downloading file: error sending request for url (https://github.com/pantsbuild/pex/releases/download/v2.1.35/pex): error trying to connect: invalid certificate: BadDER

With the help of some very nice folks on slack, we've already traced this issue to the rust part of the system, in the rustls crate used by reqwest. I've attatched the relevant logs, and I'd be happy to generate / capture anything else you'd find helpful. Long story short, it looks like I'm running into the issue reported here: https://github.com/ctz/rustls/issues/127. Rustls diverges from openssl behaviour; when attempting to process the certificate, it attempts to read the subjectAltName from it. In my case, the final dynamic inspection certificate has one, but the two previous signing CA certificates in chain only have a common name. Since this extension is missing, when rustls attempts to read the field, it fails, implodes, and the request fails with the above error (or so we currently think).

The certificates probably should have subjectAltName extensions set. However, they're generated dynamically, and not really in my control - they're handled at a much higher level by a different department in a large organization; it will take months if not years to get something like that fixed (if I even can get someone to care). Most people don't run into this issue because openssl is more permissive in this case and doesn't fault when the subjectAltName is missing. So if it's possible, it would be really helpful if a pants side fix could be implemented to address this.

SSL Intercept systems like this are becoming increasingly common at large corporations, to prevent in-network malicious actors from messaging home in SSL packets. I promise you this isn't a one-off issue.

I'm a full stack developer, and normally I'd just figure out the solution, and issue an MR with a proposed fix (since this affects me, it seems like the right thing to do). However, I'm not at all familiar with rust, and this is a security issue - I'm not the person you want poking around in the source code trying to fix this. I'd be happy to help in any other way I can, and I can test / debug / log anything that would be helpful.

I have captured the SSL exchange at the packet level, and I have confirmed that the certificates in the exchange are fully formed and valid. As the certificates are authorities, and could be used for signing purposes, I can't actually share any of these, but, I can report back any information that would be helpful, or try to capture any details that might be useful.

Thanks in advance for any help :smile:

20210501194000_pants_debug.log

benjyw commented 3 years ago

To clarify, you have set a custom CA bundle like this: https://www.pantsbuild.org/docs/proxies#setting-up-a-certificate-authority but those aren't working because of the subjectAltName issue?

cjntaylor commented 3 years ago

To clarify, you have set a custom CA bundle like this: https://www.pantsbuild.org/docs/proxies#setting-up-a-certificate-authority but those aren't working because of the subjectAltName issue?

Yes I have, and no, it doesn't fix this (it actually makes things worse 😅). I should have lead with that; with the setup my company uses, I'm well accustomed to having to configure an alternate certificate bundle for tooling on my system. So much so, that I've written a rather nice set of scripts similar to update-ca-certificates on debian systems (if anyone is familiar) that pulls all of the certificates from the system keychains, dumps them as pem files, and bundles them into a single pem in the location the homebrew openssl expects to find them (/usr/local/etc/openssl@1.1/cert.pem). This makes many things intrinsically find the bundle as the homebrew openssl knows to find the bundle at that location, and things work as expected - however, I do have several environment variables set for sanity (NODE_EXTRA_CA_CERTS, REQUESTS_CA_BUNDLE, PIP_CERT, etc).

I've verified that this works with a number of different tools (openssl s_client, curl), including things that are pip (pip, pipenv) and requests based (poetry) and everything has been working without issue. My company cut over to SSL inspection nearly a year ago and I needed these tools right away, so I figured a lot of this out ages ago.

All this is to say, I'm getting very accustomed to this song and dance. So much so that I'm used to everything up to and including the packet capture I described (to make sure the client is behaving / the SSL request is well formed / to steal the target URL so I can test with openssl s_client or curl). The very first thing I did was dig through the docs and find the page you referenced and was pleased that I could just set a simple environment variable, and that it wasn't more complicated than that (props to the team for having the option available 😄).

However, this only seems to make things worse: 20210502084200_pants_debug.log

The rust engine only seems to crash sooner - even with debugging turned on I can't seem to capture anything that specifies why (other than the cryptic "BadDER" output - more on this later). I believe this is due to the rust engine attempting to parse the bundle and failing on certificates in the bundle. Now, keep in mind this bundle consists of:

Per the issue I linked above (https://github.com/ctz/rustls/issues/127), rustls will fail for ANY certificate that only has a commonName set without subjectAltName extensions (we think: https://github.com/ctz/rustls/issues/127#issuecomment-691718644). This could be any of the certificates in the bundle, including ones from the system keychain. In that very issue, in fact, a user reports this exact problem on a linux system, attempting to use the system bundle (https://github.com/ctz/rustls/issues/127#issuecomment-683486583). The issue is that rustls diverges in behaviour from openssl for validating these certificates. Believe me, I completely understand the merits of using a codebase that doesn't suffer from the potential memory issues that openssl has - but the problem is that it also doesn't follow the de-facto standard here, especially for developers (who seem to primarily use unix systems where openssl/libressl is the baseline). OpenSSL may have its issues, and it may not be the best thing, but, it's used in everything, and certificates are generated with and verified against it.

As I mentioned before, I can't share these certificates because they are authorities so they could be used for signing and therefore malicious purposes. However, what I can tell you is that they're (1) autogenerated by standard microsoft tooling - not by me, by software controlled by other people in my very large organization and (2) work just fine with every other tool that uses SSL on my system. They are not poorly formed certificates causing parsing errors - they're valid and fully formed. I've even verified that they come over the wire in the SSL packets fully formed (along with the SSL Inspection dynamically generated cert, also valid and fully formed).

Sorry for the long answer, just trying to be thorough. I genuinely want to figure out why this isn't working, and help however I can. I'll try anything you all suggest.

cjntaylor commented 3 years ago

To clarify, you have set a custom CA bundle like this: https://www.pantsbuild.org/docs/proxies#setting-up-a-certificate-authority but those aren't working because of the subjectAltName issue?

Oops, forgot one last detail. The python that pants finds and uses to create its virtual environment (3.8.9) is installed via asdf, which is a wrapper around pyenv and is installed by building it from source via python-build. This system uses the openssl provided by homebrew when building the openssl module; the one I mentioned intrinsically finds the bundle at /usr/local/etc/openssl@1.1/cert.pem. So with REQUESTS_CA_BUNDLE set (and PIP_CERT, although this isn't necessary as it just uses the openssl module), any python I install this way natively uses the CA bundle my system generates. So at least for the python side (and I think pex?), SSL requests should work without needing to set PANTS_CA_CERTS_PATH. I think this is why the first log I sent gets as far as it does even though the variable isn't set.

benjyw commented 3 years ago

Hmm, this is a tricky one. I appreciate the thorough debugging and all the information you've provided.

To be clear - it's a deep part of our project philosophy that things should work as users expect. So we wouldn't treat "you have to regenerate your certificates a certain way" as an acceptable solution, if those certs work in practice for every other tool. As you say, for better or worse, "what openssl supports" is the de-facto standard.

Will have to think through this, but to debug and fix it would be super helpful to have an example of a cert that exhibits this behavior. How sure are we that the crux is "commonName set without subjectAltName"? Could you identify a cert in your bundle that has that property? And can you provide any more detail as to how that cert was generated? We can then try and generate a similar cert to test on.

cjntaylor commented 3 years ago

How sure are we that the crux is "commonName set without subjectAltName"?

We're not. As a rust novice, I'm just entirely going off of what @tdyas was suggesting in slack when I posed this question (which spawned this issue); he's the one who found the issue from rustls linked several times above. That assumption is entirely based on the explanation here: https://github.com/ctz/rustls/issues/127#issuecomment-691718644

Could you identify a cert in your bundle that has that property?

After a little bit of digging, tada!

-----BEGIN CERTIFICATE-----
MIIFDjCCA/agAwIBAgIMDulMwwAAAABR03eFMA0GCSqGSIb3DQEBCwUAMIG+MQsw
CQYDVQQGEwJVUzEWMBQGA1UEChMNRW50cnVzdCwgSW5jLjEoMCYGA1UECxMfU2Vl
IHd3dy5lbnRydXN0Lm5ldC9sZWdhbC10ZXJtczE5MDcGA1UECxMwKGMpIDIwMDkg
RW50cnVzdCwgSW5jLiAtIGZvciBhdXRob3JpemVkIHVzZSBvbmx5MTIwMAYDVQQD
EylFbnRydXN0IFJvb3QgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkgLSBHMjAeFw0x
NTEwMDUxOTEzNTZaFw0zMDEyMDUxOTQzNTZaMIG6MQswCQYDVQQGEwJVUzEWMBQG
A1UEChMNRW50cnVzdCwgSW5jLjEoMCYGA1UECxMfU2VlIHd3dy5lbnRydXN0Lm5l
dC9sZWdhbC10ZXJtczE5MDcGA1UECxMwKGMpIDIwMTIgRW50cnVzdCwgSW5jLiAt
IGZvciBhdXRob3JpemVkIHVzZSBvbmx5MS4wLAYDVQQDEyVFbnRydXN0IENlcnRp
ZmljYXRpb24gQXV0aG9yaXR5IC0gTDFLMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A
MIIBCgKCAQEA2j+W0E25L0Tn2zlem1DuXKVh2kFnUwmqAJqOV38pa9vH4SEkqjrQ
jUcj0u1yFvCRIdJdt7hLqIOPt5EyaM/OJZMssn2XyP7BtBe6CZ4DkJN7fEmDImiK
m95HwzGYei59QAvS7z7Tsoyqj0ip/wDoKVgG97aTWpRzJiatWA7lQrjV6nN5ZGhT
JbiEz5R6rgZFDKNrTdDGvuoYpDbwkrK6HIiPOlJ/915tgxyd8B/lw9bdpXiSPbBt
LOrJz5RBGXFEaLpHPATpXbo+8DX3Fbae8i4VHj9HyMg4p3NFXU2wO7GOFyk36t0F
ASK7lDYqjVs1/lMZLwhGwSqzGmIdTivZGwIDAQABo4IBDDCCAQgwDgYDVR0PAQH/
BAQDAgEGMBIGA1UdEwEB/wQIMAYBAf8CAQAwMwYIKwYBBQUHAQEEJzAlMCMGCCsG
AQUFBzABhhdodHRwOi8vb2NzcC5lbnRydXN0Lm5ldDAwBgNVHR8EKTAnMCWgI6Ah
hh9odHRwOi8vY3JsLmVudHJ1c3QubmV0L2cyY2EuY3JsMDsGA1UdIAQ0MDIwMAYE
VR0gADAoMCYGCCsGAQUFBwIBFhpodHRwOi8vd3d3LmVudHJ1c3QubmV0L3JwYTAd
BgNVHQ4EFgQUgqJwdN28Uz/Pe9T3zX+nYMYKTL8wHwYDVR0jBBgwFoAUanImetAe
733nO2lR1GyNn5ASZqswDQYJKoZIhvcNAQELBQADggEBADnVjpiDYcgsY9NwHRkw
y/YJrMxp1cncN0HyMg/vdMNY9ngnCTQIlZIv19+4o/0OgemknNM/TWgrFTEKFcxS
BJPok1DD2bHi4Wi3Ogl08TRYCj93mEC45mj/XeTIRsXsgdfJghhcg85x2Ly/rJkC
k9uUmITSnKa1/ly78EqvIazCP0kkZ9Yujs+szGQVGHLlbHfTUqi53Y2sAEo1GdRv
c6N172tkw+CNgxKhiucOhk3YtCAbvmqljEtoZuMrx1gL+1YQ1JH7HdMxWBCMRON1
exCdtTix9qrKgWRs6PLigVWXUX/hwidQosk8WwBD9lu51aX8/wdQQGcHsFXwt35u
Lcw=
-----END CERTIFICATE-----

This is the Entrust L1K intermediate certificate. I realized that this is a fully public certificate that is available elsewhere but it has similar properties to the certificates used to sign the SSL Inspection certificates in my usecase. My company has cerificates derived from this certificate for other purposes, so my bundle also contains this one.

And can you provide any more detail as to how that cert was generated?

The SSL Inspection system essentially dynamically generates a new cert for any outgoing SSL packet using its own CA, but using all the details from the certificate from the original request. You could simulate this by creating your own self-signed CA, and then generating a self-signed certificate with the details of the original request? Looks like squid can potentially simulate the same type of proxying our system does? https://wiki.squid-cache.org/ConfigExamples/Intercept/SslBumpExplicit

I think what @tdyas is suggesting has merit, mostly because when I set PANTS_CA_CERTS_PATH to my bundle, pantsd just straight up fails to start altogether (see the second log); it reports like its having trouble parsing the CA bundle. It doesn't even get as far as trying to make requests, as far as I can tell.

benjyw commented 3 years ago

We have some infrastructure for generating certs for testing (https://github.com/pantsbuild/pants/tree/main/src/python/pants/engine/internals/fs_test_data/tls) and I'm playing around with those to see if I can reproduce.

benjyw commented 3 years ago

In case anyone else wants to take a concurrent swing at it:

1) Have two shells open in the pants repo, one with the cwd at the repo root and the other with the cwd at src/python/pants/engine/internals/fs_test_data, since the cert generation script needs to run in that dir.

2) Tweak generate_certs.sh and/or openssl.cnf and then run the script in that second shell.

3) Run ./pants test src/python/pants/engine/fs_test.py -- -k test_download_https in the first shell to see the effect.

tdyas commented 3 years ago

There is some code linked from the rustls issue that may solve the issue by supplying a custom certificate verification step to rustls. https://github.com/paritytech/x509-signature/issues/4#issuecomment-691729509

cjntaylor commented 3 years ago

There is some code linked from the rustls issue that may solve the issue by supplying a custom certificate verification step to rustls. paritytech/x509-signature#4 (comment)

While this is helpful (and keep in mind I can’t fully trace the rust execution), it also feels really heavy handed. It basically solves the problem by disabling SSL altogether, other than checking that the certificate is in the CA bundle. The comments about not checking expiration don’t inspire confidence - it sort of feels like it defeats the purpose of using SSL in the first place.

I think what you’d want would be an implementation of this that strives for parity with OpenSSL. I’m still unclear exactly why the two diverge here, or if we’ve even fully bisected the issue. I think it’s fair that our running assumption about subjectAltName is valid, but, based on the discussion in the rustls issue, it sounds like a bug on their side. Regardless of if the specification says that the SAN extension should be included, the majority of systems use OpenSSL which apparently ignores it if it’s missing (if that’s even the issue). It’s great and all for them to stick to their principles / the spec here, but not in the face of practicality (especially when the majority of users won’t know or have control over their CA bundle. It’s important to keep in mind we’re all power users here).

benjyw commented 3 years ago

I'm pushing up against my limited knowledge of TLS here, but isn't subjectAltName relevant only to the end-entity cert? The CA certs we test against don't set it (see above), but the server cert does. and as far as I can tell (again, from limited understanding) it isn't necessary for the CA certs, as those aren't being verified against a hostname.

When I look at the certs we test against, only the server cert has a subjectAltName, the others do not.

benjyw commented 3 years ago

Meanwhile, naively appending that Entrust L1K intermediate certificate you pasted above into the test's server.chain does not reproduce the issue.

benjyw commented 3 years ago

@cjntaylor I hate to impose, but if it's not too much trouble, this would be super-helpful in diagnosis: Could you do the following?

It'll be instructive to see if this fails on loading the certs, and if so which cert is the issue.

The idea is to isolate whether this has to do with some property of the custom certs, or whether it's specifically to do with the rewritten ssl packets.

There might be more than one thing going on, of course... In your experimentation, the fact that pantsd got the error on startup would indicate that it's an issue with reading the custom cert bundle. But if you don't set it, I'd expect an error because the end-entity cert has been rewritten to something for which the client has no chain of trust, but I'm not sure I'd expect that error to be "Bad DER"... So I'm trying to get a tighter handle on this.

Again, sorry to impose, but hopefully this will help reproduce the error without access to the real certs, so we can then work on, and verify, a fix.

cjntaylor commented 3 years ago

I'm pushing up against my limited knowledge of TLS here, but isn't subjectAltName relevant only to the end-entity cert?

I'm even less knowledgeable than you, but that's what I thought too. I've just echoed what I have so far as it's all I've had to go on

I hate to impose, but if it's not too much trouble, this would be super-helpful in diagnosis

You're never imposing, I'm happy to help. I'd been thinking after you posted the reproduction method that I should get a testing environment setup and I have.

Aha! I did a bit of certificate "bijecting" and found the culprit. My non-working bundle contains my public signing key for my apple developer account :man_facepalming: Apologies for a bit of a goose chase; this breaks because the certificate in question isn't actually a CA, so the bundle isn't actually valid from rustls's perspective. I've updated my personal bundle generation script to fix this, to filter out "non-CA" certificates from the bundle. The resulting bundle appears to work perfectly fine now - I'll do some additional testing but I believe it's functioning as expected.

However, this wasn't entirely pointless. The reason this works with everywhere else is that openssl silently ignores any non-ca certificates in the bundle, so this hasn't been an issue until now. This will affect anyone else running homebrew on macOS - my fancy bundle generator is just a derived rewrite of a ruby function that the openssl formula runs on install. It'll affect anyone trying to use that bundle or anything built from it the same way, as the script doesn't distinguish which certificates are CAs and which aren't.

I'll leave this open for now but feel free to mark this wontfix if you feel like this requirement is reasonable (I think it's fine actually, I'm more comfortable arguing that a CA bundle should only contain CA certificates and erroring when it doesn't is fair). I'd just ask that we maybe consider making the error message a little bit less cryptic, at least?

benjyw commented 3 years ago

First of all, great news that it's now working! Phew!

As for automatically filtering out non-CA certs. We could certainly do that in order to emulate openssl. Or we could, as you say, at least make the error message more useful. So I'll leave this open for now.

Thanks for persevering and debugging!

cjntaylor commented 3 years ago

Well...I spoke too soon. It's only half working. My company (for reasons I won't get into) hard cuts our internet mainline every night so there are limits on the testing I can do against external targets.

Removing the culprit certificates does let me set PANTS_CA_CERTS_PATH to my custom ca bundle without pantsd just immediately crashing, so we've fixed that (yay!). However, I'm still getting the same error:

20210503094100_pants_debug.log

This seems to have brought things to parity wether or not I set PANTS_CA_CERTS_PATH at this point 🤔. It still seems to crash down in rustls in a way I can't quite understand and doesn't seem to get captured by the logs (I've grabbed a snippet):

09:40:55.45 [DEBUG] (reqwest::connect) starting new connection: https://github.com/
09:40:55.45 [DEBUG] (hyper::client::connect::dns) resolving host="github.com" 
09:40:55.45 [DEBUG] (process_execution) Running Searching for `bash` on PATH=/usr/bin:/bin:/usr/local/bin under semaphore with concurrency id: 1
09:40:55.45 [DEBUG] (process_execution::local) Obtaining exclusive spawn lock for process with argv ["./find_binary.sh", "bash"] since we materialized its executable RelativePath("find_binary.sh").
09:40:55.46 [DEBUG] (process_execution::local) spawned local process as Some(8415) for Process { argv: ["./find_binary.sh", "bash"], env: {"PATH": "/usr/bin:/bin:/usr/local/bin"}, working_directory: None, input_files: Digest { hash: Fingerprint<124495cfdd1160d88a797da1b5de65f0b9d57642b553d7ff5195d5375411b492>, size_bytes: 91 }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, description: "Searching for `bash` on PATH=/usr/bin:/bin:/usr/local/bin", level: Debug, append_only_caches: {}, jdk_home: None, platform_constraint: None, is_nailgunnable: false, cache_scope: PerRestart }
09:40:55.47 [DEBUG] (hyper::client::connect::http) connecting to 140.82.114.4:443 
09:40:55.50 [DEBUG] (hyper::client::connect::http) connected to 140.82.114.4:443 
09:40:55.50 [DEBUG] (rustls::client::hs) No cached session for DNSNameRef("github.com")
09:40:55.50 [DEBUG] (rustls::client::hs) Not resuming any session
09:40:55.54 [DEBUG] (rustls::client::hs) ALPN protocol is None
09:40:55.54 [DEBUG] (rustls::client::hs) Using ciphersuite TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
09:40:55.54 [DEBUG] (rustls::client::tls12) ECDHE curve is ECParameters { curve_type: NamedCurve, named_group: secp384r1 }
09:40:55.54 [DEBUG] (rustls::client::tls12) Server DNS name is DNSName("github.com")
09:40:55.54 [WARN] (rustls::session) Sending fatal alert DecodeError
09:40:55.54 [DEBUG] (workunit_store) Completed: Downloading: DownloadFile(url='https://github.com/pantsbuild/pex/releases/download/v2.1.35/pex', expected_digest=FileDigest(fingerprint='3b1c97ebd79a8b650c81f0afbdbefef17c3d1a128219a8e95b5f35c2c9dfd53... (37 characters truncated)
09:40:55.54 [DEBUG] (workunit_store) Completed: pants.core.util_rules.external_tool.download_external_tool
09:40:55.54 [DEBUG] (workunit_store) Canceled: Find Python interpreter to bootstrap PEX
09:40:55.54 [DEBUG] (workunit_store) Completed: Find Python interpreter for constraints
09:40:55.54 [DEBUG] (workunit_store) Canceled: Finding the `bash` binary
09:40:55.55 [DEBUG] (workunit_store) Completed: `dependencies` goal
09:40:55.55 [DEBUG] (pants.engine.internals.scheduler) computed 1 nodes in 0.156098 seconds. there are 250 total nodes.
09:40:55.55 [ERROR] (pants.base.exception_sink) Exception caught: (pants.engine.internals.scheduler.ExecutionError)

Regardless of logging level, there doesn't seem to ever be any additional information regarding the warning 09:40:55.54 [WARN] (rustls::session) Sending fatal alert DecodeError, but that seems to be the crux of the issue. Thoughts? I'm still setup for testing if there is more you'd like me to try...

tdyas commented 3 years ago

You can enable trace level logging for rustls and see if that gets more out of the logging. Try adding --log-levels-by-target='{"rustls": "trace"}' to your invocation. I believe specifying "rustls" will affect sub-targets like "rustls::client::tls12" but if doesn't you may have to specify the more-specific targets. (-ltrace would be another way to enable trace-level logging but would do so for all log targets and so would be very verbose.)

cjntaylor commented 3 years ago

-ltrace would be another way to enable trace-level logging but would do so for all log targets and so would be very verbose

I suspected this might work and I gave it a shot; no dice. It outputs exactly the same session warning DecodeError in the same place. All that is added is additional information regarding the setup and negotiation of the connection to the server over SSL. It does contain a dump of the certificate bytes returned by the exchange - but for that reason, I'm not comfortable sharing the logs from this level of debugging (yes, I'm aware these are "public keys", but I'm 99% sure I'd still have to get authorization to share the contents anyways - they're technically "private" to our intranet - they have no utility or exposure elsewhere).

I did another packet capture with the fixed CA bundle in place (and PANTS_CA_CERTS_PATH set), and I dumped and hand checked the bytes of each certificate. They parse (via openssl) as valid certificates, and match what I would expect. The intermediate CA and root CA are exact matches for two of the certificates I'm required to add to my CA bundle. I suppose it's possible that rustls or reqwests is doing something odd when parsing the actual SSL packet, but, as best I can tell, everything does respond in a "valid" way.

Next things to try?

tdyas commented 3 years ago

rustls uses the webpki crate to verify certificates. webpki has an open issue that appears to be on point entitled "During parsing, accept v1 certificates and allow certificates without subjectAltName": https://github.com/briansmith/webpki/issues/219

So webpki needs to be fixed to support these certificates. The description of the fix seems straight-forward enough to accomplish (although I have not examined the webpki source to really scope out that issue).

cjntaylor commented 3 years ago

More details to share. I did a bit more digging and I found something interesting. As mentioned, when -ltrace is enabled, rustls dumps the certificates it receives to the log as what appears to be a python byte string wrapped in a Certificate class/struct/something (rust byte string? buler?) On a whim I used python to dump it to file.

First thought was to try to parse it with openssl as a DER (openssl x509 -in <file> -inform DER -noout -text). No luck, openssl reports:

unable to load certificate
4654472640:error:0D06B08E:asn1 encoding routines:asn1_d2i_read_bio:not enough data:crypto/asn1/a_d2i_fp.c:198:

Hmm 🤔. Dumping it with hexdump -C, its 1283 bytes long. I had the one that did parse from pulling the certificate out of the packets generated by the request that produced that byte string. Dumping that one, its 1288 bytes long. That can't be a coincidence. Comparison time:

63_—_62

Parsing cert/packet capture/good on the left, rustls/byte string/short one on the right. They *almost* line up, but not quite - right up until that 0x30 byte. If you look carefully, however, it's actually just shifted by one byte. Any guesses how many bytes are missing between the two files? :smile:

There's a consistent problem between the good and the rustls side of things that looks like a parsing error, and its two-fold. First, some sort of parsing error is causing bytes to get missed. I believe it's an issue with zero bytes, because it seems to have to do with hex 0x30, which is 48 in decimal or literal 0 in ascii. In some cases, it appears to have caused the byte to get missed outright. Further down in the cert, theres a more subtle issue, but I think it's still the same bug. There are byte errors, but they're consistent - several of the bytes in the byte string are incorrect by an upper byte of 0x3, that is to say:

{incorrect byte} | 0x30 => {correct byte}

(That operation is bitwise OR, for clarity). So what I think is happening is actually that somehow, zero bytes are getting decoded improperly. In some cases, they're being omitted and dropped from the certificate. In others, bytes that should be hex 0x30 are getting mangled. I've found all three of these cases:

  1. A missing byte 0x30
  2. A missing byte 0x00
  3. A incorrect byte by 0x30 (e.g. a byte 0x03 that should be 0x33, a byte 0x05 that should be 0x35)

I actually think the first case might be a sort of "combination" of the latter two (the byte both gets mangled, and misinterpreted?) All this said, I've had one red herring already so I'm trying to be extra extra diligent to make sure I don't get down the rabbit hole too far this time too quickly. Thoughts?

The only reason I don't blame the injection server (which definitely plays a role here) is because I've run packet captures CONCURRENTLY between these two. The comparisons I'm making are from certificates I've hand extracted the bytes from pcap captures of the SSL exchange via wireshark that were generated by running the pants test. So I know for sure that at a minimum, the binary string in the log and the certificate I dump are at least correlated. Also, as the packet capture is done at the level of response from the server, I'm hard pressed to come up with any reason why the server could be the fault point here, because I believe I've captured the "on the wire" data.

One step further: I can curl the URL that pants tries to access (https://github.com/pantsbuild/pex/releases/download/v2.1.35/pex). It involves the same SSL injection server. The same certs. I've done the same capture. It produces the same dumped certificates. Byte for byte. It works perfectly, no issues. So I'm 99% sure that the issue is on the rustls side somewhere.

Turns out fatal alert DecodeError may not be that bad of an error report after all; just not all that helpful in diagnosing it. The question is, if I haven't made some sort of logic error somewhere, what now? How do I diagnose what is causing this to happen? If it's not the server, and I can't get even trace level logging to help me find more detail (other than helping me get this far), where do we go from here?

cjntaylor commented 3 years ago

rustls uses the webpki crate to verify certificates. webpki has an open issue that appears to be on point entitled "During parsing, accept v1 certificates and allow certificates without subjectAltName": briansmith/webpki#219

So webpki needs to be fixed to support these certificates. The description of the fix seems straight-forward enough to accomplish (although I have not examined the webpki source to really scope out that issue).

I think I may have actually determined that this is a bit of a red-herring, or at least lower priority. Now that I've fixed my bundle and I've been able to pull apart the certificates used in a few different places, I can tell you things I didn't know for sure before:

pants_san_extensions

Anyways, I think the issue may actually be the byte issues I found. As for why there are byte issues, I couldn't tell you, but I'm still actively trying to figure that out.

benjyw commented 3 years ago

Ooooof, thanks for the excellent debugging. This is frustrating. If we can narrow this down to that bytestream decoding issue, and produce a test that proves this, then we can submit that bug to rustls or webpki, possibly with a fix.

benjyw commented 3 years ago

And yes, this subjectAltName thing seems like a red herring. If we can reproduce this bytestream parsing issue in a standalone way then we've got something to work with. I assume the cert in question is private?

benjyw commented 3 years ago

Does this specific cert have some property that is different from the others? Presumably ~all certs have zero bytes in them, so it must be some other thing that causes the parser to fail (or possibly causes a different parser to be called).

For example, could it be ecdsa vs rsa or something?

benjyw commented 3 years ago

It may or may not be coincidence that 0x30 has special significance in the DER-encoding:

A correct DER-encoded signature has the following form:

0x30: a header byte indicating a compound structure.
A 1-byte length descriptor for all what follows.
0x02: a header byte indicating an integer.
A 1-byte length descriptor for the R value
The R coordinate, as a big-endian integer.
0x02: a header byte indicating an integer.
A 1-byte length descriptor for the S value.
The S coordinate, as a big-endian integer.
Where initial 0x00 bytes for R and S are not allowed, except when their first byte would otherwise be above 0x7F (in which case a single 0x00 in front is required). Also note that inside transaction signatures, an extra hashtype byte follows the actual signature data.
cjntaylor commented 3 years ago

And yes, this subjectAltName thing seems like a red herring. If we can reproduce this bytestream parsing issue in a standalone way then we've got something to work with. I assume the cert in question is private?

Ooooof

Since this has gotten so complicated, I've decided to do the following though; this is the "correct" certificate I pulled from the packet dump:

-----BEGIN CERTIFICATE-----
MIIFBDCCA+ygAwIBAgIQwD4lN4b7UyimO6kNfplrVDANBgkqhkiG9w0BAQsFADBr
MQswCQYDVQQGEwJVUzE8MDoGA1UEChMzSm9obnMgSG9wa2lucyBVbml2ZXJzaXR5
IEFwcGxpZWQgUGh5c2ljcyBMYWJvcmF0b3J5MR4wHAYDVQQDExVKSFVBUEwgU1NM
IEluc3BlY3Rpb24wHhcNMjEwMzI1MDAwMDAwWhcNMjIwMzMwMjM1OTU5WjBmMQsw
CQYDVQQGEwJVUzETMBEGA1UECBMKQ2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZy
YW5jaXNjbzEVMBMGA1UEChMMR2l0SHViLCBJbmMuMRMwEQYDVQQDEwpnaXRodWIu
Y29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAoATthEHhkqMLCHII
idByDiMqz/Vc0+9A56YcyR81J9dcduP+sDSfgpHzlgEKyi/+zTDeGwJ6I1s/6fcE
K2zcTKGUP7PSZq89i+TGDOB4HvwGS3pSrNzSR/mzqcIm0UsnWbMn9jK918S/VJle
2cmzVWJYQ5LdrVdbhpnMj38h99w/jtGmMIAaNIXWfNEiaQTU2+W6wAW/IOp7XInf
+dK3mJmPWYNO0dJBWjDFjxfbMtYwnCo4mDLtC0AOJUADCb3T5fDOoNTsU9iNnJAp
WcBP0wjTmTZlG9p03vxT/Oh6NRHpQ0GrJzvsE873O13xPVNi6qCUsyX8tWKLJ+nc
0wyiewIDAQABo4IBpzCCAaMwJQYDVR0RBB4wHIIKZ2l0aHViLmNvbYIOd3d3Lmdp
dGh1Yi5jb20wDgYDVR0PAQH/BAQDAgOoMBMGA1UdJQQMMAoGCCsGAQUFBwMBMAwG
A1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAUGnQk3SOpf9ERxkX8Uedo+tQB3b8wHQYD
VR0OBBYEFIXPrUGLy/JgfMGM4wrUjHhHtaQIMIIBBQYKKwYBBAHWeQIEAgSB9gSB
8wDxAHYAKXm+8J45OSHwVnOfY6V35b5XfZxgCvj5TV0mXCVdx4QAAAF4ar+9GgAA
BAMARzBFAiEAnuaIRH/8NEWcMk2fq5SGBq7dYy3i9V9jl0aKC6U52NcCIEhUJ9HG
MrW/gXfX6xVorPLI7skBrR/MNAzuyRByRJhZAHcAIkVFB1lVJFaWP6Ev8fdthuAj
JmOtwEt/XcaDXG7iDwIAAAF4ar+9OQAABAMASDBGAiEAmAASSglBGK8GXCjvHrve
hWx/WKnT3payFmqZEK4v8mkCIQDdxfitvfBosMurgLjw1KhSZzDno/A7+ba7CdCm
tv7KHTANBgkqhkiG9w0BAQsFAAOCAQEAXBxuh2MwqK583SXm6SdGPOX80N+W0f7r
/LHIjIJtUDLIpxV5lC7BjqUAJl0bqtQE7v3PEaEuqp99EVOjjNFInESLZmCYLe5F
nBa5MzXbS1wB9FB7hqn+Ux07KFaQ+ZloPsCeOt1zBL7pEseCS12LZB/VeGVyZLLh
8gH+a0jqPJn58W01XgxRyLtdIIpgImbdhLm7Uxb7NrCSjMp9qnc36F+EKKOk/ZYk
eGHg2j372FMzsKk7GogRRNyOsOEVzm3xIP495O4J0vYp57MAMy5UTd0v4fuGdunm
upJtYiu/SzH2B8WdpAnyudOo9boOWCRERSDwdUfBLW2kADVUEw0FXw==
-----END CERTIFICATE-----

These are "public" of sorts, and I don't think it's going to be possible to reasonably debug this at this point without some access to the relevant certificates. So I've decided to share the server certificate only - mostly because this is almost entirely derived from the public github certificate. I'll hold back the intermediate and root CA as private (just know that there are two more certificates, that form the complete chain for this one).

Does this specific cert have some property that is different from the others?

Relevant here: Both of the other certificates in the chain also have missing bytes. Specifically, they were both missing a handful of 0x30 bytes, but had no other byte corruption issues. Once added, they matched exactly (all it took was adding the 0x30 bytes in the missing positions and everything shifted into place).

This is the corrupt certificate I pulled from the byte string. @benjyw , can you confirm that my decoding method is valid - I lifted this:

Certificate(b"0\x82\x05\x040\x82\x03\xec\xa0\x03\x02\x01\x02\x02\x10\xc0>%7\x86\xfbS(\xa6;\xa9\r~\x99kT0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\00k1\x0b0\t\x06\x03U\x04\x06\x13\x02US1<0:\x06\x03U\x04\n\x133Johns Hopkins University Applied Physics Laboratory1\x1e0\x1c\x06\x03U\x04\x03\x13\x15JHUAPL SSL Inspection0\x1e\x17\r210325000000Z\x17\r220330235959Z0f1\x0b0\t\x06\x03U\x04\x06\x13\x02US1\x130\x11\x06\x03U\x04\x08\x13\nCalifornia1\x160\x14\x06\x03U\x04\x07\x13\rSan Francisco1\x150\x13\x06\x03U\x04\n\x13\x0cGitHub, Inc.1\x130\x11\x06\x03U\x04\x03\x13\ngithub.com0\x82\x01\"0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x01\x05\0\x03\x82\x01\x0f\00\x82\x01\n\x02\x82\x01\x01\0\xa0\x04\xed\x84A\xe1\x92\xa3\x0b\x08r\x08\x89\xd0r\x0e#*\xcf\xf5\\\xd3\xef@\xe7\xa6\x1c\xc9\x1f5'\xd7\\v\xe3\xfe\xb04\x9f\x82\x91\xf3\x96\x01\n\xca/\xfe\xcd0\xde\x1b\x02z#[?\xe9\xf7\x04+l\xdcL\xa1\x94?\xb3\xd2f\xaf=\x8b\xe4\xc6\x0c\xe0x\x1e\xfc\x06KzR\xac\xdc\xd2G\xf9\xb3\xa9\xc2&\xd1K'Y\xb3'\xf62\xbd\xd7\xc4\xbfT\x99^\xd9\xc9\xb3UbXC\x92\xdd\xadW[\x86\x99\xcc\x8f\x7f!\xf7\xdc?\x8e\xd1\xa60\x80\x1a4\x85\xd6|\xd1\"i\x04\xd4\xdb\xe5\xba\xc0\x05\xbf \xea{\\\x89\xdf\xf9\xd2\xb7\x98\x99\x8fY\x83N\xd1\xd2AZ0\xc5\x8f\x17\xdb2\xd60\x9c*8\x982\xed\x0b@\x0e%@\x03\t\xbd\xd3\xe5\xf0\xce\xa0\xd4\xecS\xd8\x8d\x9c\x90)Y\xc0O\xd3\x08\xd3\x996e\x1b\xdat\xde\xfcS\xfc\xe8z5\x11\xe9CA\xab';\xec\x13\xce\xf7;]\xf1=Sb\xea\xa0\x94\xb3%\xfc\xb5b\x8b'\xe9\xdc\xd3\x0c\xa2{\x02\x03\x01\0\x01\xa3\x82\x01\xa70\x82\x01\xa30%\x06\x03U\x1d\x11\x04\x1e0\x1c\x82\ngithub.com\x82\x0ewww.github.com0\x0e\x06\x03U\x1d\x0f\x01\x01\xff\x04\x04\x03\x02\x03\xa80\x13\x06\x03U\x1d%\x04\x0c0\n\x06\x08+\x06\x01\x05\x05\x07\x03\x010\x0c\x06\x03U\x1d\x13\x01\x01\xff\x04\x020\00\x1f\x06\x03U\x1d#\x04\x180\x16\x80\x14\x1at$\xdd#\xa9\x7f\xd1\x11\xc6E\xfcQ\xe7h\xfa\xd4\x01\xdd\xbf0\x1d\x06\x03U\x1d\x0e\x04\x16\x04\x14\x85\xcf\xadA\x8b\xcb\xf2`|\xc1\x8c\xe3\n\xd4\x8cxG\xb5\xa4\x080\x82\x01\x05\x06\n+\x06\x01\x04\x01\xd6y\x02\x04\x02\x04\x81\xf6\x04\x81\xf3\0\xf1\0v\0)y\xbe\xf0\x9e99!\xf0Vs\x9fc\xa5w\xe5\xbeW}\x9c`\n\xf8\xf9M]&\\%]\xc7\x84\0\0\x01xj\xbf\xbd\x1a\0\0\x04\x03\0G0E\x02!\0\x9e\xe6\x88D\x7f\xfc4E\x9c2M\x9f\xab\x94\x86\x06\xae\xddc-\xe2\xf5_c\x97F\x8a\x0b\xa59\xd8\xd7\x02 HT'\xd1\xc62\xb5\xbf\x81w\xd7\xeb\x15h\xac\xf2\xc8\xee\xc9\x01\xad\x1f\xcc4\x0c\xee\xc9\x10rD\x98Y\0w\0\"EE\x07YU$V\x96?\xa1/\xf1\xf7m\x86\xe0#&c\xad\xc0K\x7f]\xc6\x83\\n\xe2\x0f\x02\0\0\x01xj\xbf\xbd9\0\0\x04\x03\0H0F\x02!\0\x98\0\x12J\tA\x18\xaf\x06\\(\xef\x1e\xbb\xde\x85l\x7fX\xa9\xd3\xde\x96\xb2\x16j\x99\x10\xae/\xf2i\x02!\0\xdd\xc5\xf8\xad\xbd\xf0h\xb0\xcb\xab\x80\xb8\xf0\xd4\xa8Rg0\xe7\xa3\xf0;\xf9\xb6\xbb\t\xd0\xa6\xb6\xfe\xca\x1d0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\0\x03\x82\x01\x01\0\\\x1cn\x87c0\xa8\xae|\xdd%\xe6\xe9'F<\xe5\xfc\xd0\xdf\x96\xd1\xfe\xeb\xfc\xb1\xc8\x8c\x82mP2\xc8\xa7\x15y\x94.\xc1\x8e\xa5\0&]\x1b\xaa\xd4\x04\xee\xfd\xcf\x11\xa1.\xaa\x9f}\x11S\xa3\x8c\xd1H\x9cD\x8bf`\x98-\xeeE\x9c\x16\xb935\xdbK\\\x01\xf4P{\x86\xa9\xfeS\x1d;(V\x90\xf9\x99h>\xc0\x9e:\xdds\x04\xbe\xe9\x12\xc7\x82K]\x8bd\x1f\xd5xerd\xb2\xe1\xf2\x01\xfekH\xea<\x99\xf9\xf1m5^\x0cQ\xc8\xbb] \x8a`\"f\xdd\x84\xb9\xbbS\x16\xfb6\xb0\x92\x8c\xca}\xaaw7\xe8_\x84(\xa3\xa4\xfd\x96$xa\xe0\xda=\xfb\xd8S3\xb0\xa9;\x1a\x88\x11D\xdc\x8e\xb0\xe1\x15\xcem\xf1 \xfe=\xe4\xee\t\xd2\xf6)\xe7\xb3\03.TM\xdd/\xe1\xfb\x86v\xe9\xe6\xba\x92mb+\xbfK1\xf6\x07\xc5\x9d\xa4\t\xf2\xb9\xd3\xa8\xf5\xba\x0eX$DE \xf0uG\xc1-m\xa4\05T\x13\r\x05_")

I pulled out the byte string, put treated it as a literal in python, and dumped it as a binary file in python:

cert = b"0\x82\x05\x040\x82\x03\xec\xa0\x03\x02\x01\x02\x02\x10\xc0>%7\x86\xfbS(\xa6;\xa9\r~\x99kT0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\00k1\x0b0\t\x06\x03U\x04\x06\x13\x02US1<0:\x06\x03U\x04\n\x133Johns Hopkins University Applied Physics Laboratory1\x1e0\x1c\x06\x03U\x04\x03\x13\x15JHUAPL SSL Inspection0\x1e\x17\r210325000000Z\x17\r220330235959Z0f1\x0b0\t\x06\x03U\x04\x06\x13\x02US1\x130\x11\x06\x03U\x04\x08\x13\nCalifornia1\x160\x14\x06\x03U\x04\x07\x13\rSan Francisco1\x150\x13\x06\x03U\x04\n\x13\x0cGitHub, Inc.1\x130\x11\x06\x03U\x04\x03\x13\ngithub.com0\x82\x01\"0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x01\x05\0\x03\x82\x01\x0f\00\x82\x01\n\x02\x82\x01\x01\0\xa0\x04\xed\x84A\xe1\x92\xa3\x0b\x08r\x08\x89\xd0r\x0e#*\xcf\xf5\\\xd3\xef@\xe7\xa6\x1c\xc9\x1f5'\xd7\\v\xe3\xfe\xb04\x9f\x82\x91\xf3\x96\x01\n\xca/\xfe\xcd0\xde\x1b\x02z#[?\xe9\xf7\x04+l\xdcL\xa1\x94?\xb3\xd2f\xaf=\x8b\xe4\xc6\x0c\xe0x\x1e\xfc\x06KzR\xac\xdc\xd2G\xf9\xb3\xa9\xc2&\xd1K'Y\xb3'\xf62\xbd\xd7\xc4\xbfT\x99^\xd9\xc9\xb3UbXC\x92\xdd\xadW[\x86\x99\xcc\x8f\x7f!\xf7\xdc?\x8e\xd1\xa60\x80\x1a4\x85\xd6|\xd1\"i\x04\xd4\xdb\xe5\xba\xc0\x05\xbf \xea{\\\x89\xdf\xf9\xd2\xb7\x98\x99\x8fY\x83N\xd1\xd2AZ0\xc5\x8f\x17\xdb2\xd60\x9c*8\x982\xed\x0b@\x0e%@\x03\t\xbd\xd3\xe5\xf0\xce\xa0\xd4\xecS\xd8\x8d\x9c\x90)Y\xc0O\xd3\x08\xd3\x996e\x1b\xdat\xde\xfcS\xfc\xe8z5\x11\xe9CA\xab';\xec\x13\xce\xf7;]\xf1=Sb\xea\xa0\x94\xb3%\xfc\xb5b\x8b'\xe9\xdc\xd3\x0c\xa2{\x02\x03\x01\0\x01\xa3\x82\x01\xa70\x82\x01\xa30%\x06\x03U\x1d\x11\x04\x1e0\x1c\x82\ngithub.com\x82\x0ewww.github.com0\x0e\x06\x03U\x1d\x0f\x01\x01\xff\x04\x04\x03\x02\x03\xa80\x13\x06\x03U\x1d%\x04\x0c0\n\x06\x08+\x06\x01\x05\x05\x07\x03\x010\x0c\x06\x03U\x1d\x13\x01\x01\xff\x04\x020\00\x1f\x06\x03U\x1d#\x04\x180\x16\x80\x14\x1at$\xdd#\xa9\x7f\xd1\x11\xc6E\xfcQ\xe7h\xfa\xd4\x01\xdd\xbf0\x1d\x06\x03U\x1d\x0e\x04\x16\x04\x14\x85\xcf\xadA\x8b\xcb\xf2`|\xc1\x8c\xe3\n\xd4\x8cxG\xb5\xa4\x080\x82\x01\x05\x06\n+\x06\x01\x04\x01\xd6y\x02\x04\x02\x04\x81\xf6\x04\x81\xf3\0\xf1\0v\0)y\xbe\xf0\x9e99!\xf0Vs\x9fc\xa5w\xe5\xbeW}\x9c`\n\xf8\xf9M]&\\%]\xc7\x84\0\0\x01xj\xbf\xbd\x1a\0\0\x04\x03\0G0E\x02!\0\x9e\xe6\x88D\x7f\xfc4E\x9c2M\x9f\xab\x94\x86\x06\xae\xddc-\xe2\xf5_c\x97F\x8a\x0b\xa59\xd8\xd7\x02 HT'\xd1\xc62\xb5\xbf\x81w\xd7\xeb\x15h\xac\xf2\xc8\xee\xc9\x01\xad\x1f\xcc4\x0c\xee\xc9\x10rD\x98Y\0w\0\"EE\x07YU$V\x96?\xa1/\xf1\xf7m\x86\xe0#&c\xad\xc0K\x7f]\xc6\x83\\n\xe2\x0f\x02\0\0\x01xj\xbf\xbd9\0\0\x04\x03\0H0F\x02!\0\x98\0\x12J\tA\x18\xaf\x06\\(\xef\x1e\xbb\xde\x85l\x7fX\xa9\xd3\xde\x96\xb2\x16j\x99\x10\xae/\xf2i\x02!\0\xdd\xc5\xf8\xad\xbd\xf0h\xb0\xcb\xab\x80\xb8\xf0\xd4\xa8Rg0\xe7\xa3\xf0;\xf9\xb6\xbb\t\xd0\xa6\xb6\xfe\xca\x1d0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\0\x03\x82\x01\x01\0\\\x1cn\x87c0\xa8\xae|\xdd%\xe6\xe9'F<\xe5\xfc\xd0\xdf\x96\xd1\xfe\xeb\xfc\xb1\xc8\x8c\x82mP2\xc8\xa7\x15y\x94.\xc1\x8e\xa5\0&]\x1b\xaa\xd4\x04\xee\xfd\xcf\x11\xa1.\xaa\x9f}\x11S\xa3\x8c\xd1H\x9cD\x8bf`\x98-\xeeE\x9c\x16\xb935\xdbK\\\x01\xf4P{\x86\xa9\xfeS\x1d;(V\x90\xf9\x99h>\xc0\x9e:\xdds\x04\xbe\xe9\x12\xc7\x82K]\x8bd\x1f\xd5xerd\xb2\xe1\xf2\x01\xfekH\xea<\x99\xf9\xf1m5^\x0cQ\xc8\xbb] \x8a`\"f\xdd\x84\xb9\xbbS\x16\xfb6\xb0\x92\x8c\xca}\xaaw7\xe8_\x84(\xa3\xa4\xfd\x96$xa\xe0\xda=\xfb\xd8S3\xb0\xa9;\x1a\x88\x11D\xdc\x8e\xb0\xe1\x15\xcem\xf1 \xfe=\xe4\xee\t\xd2\xf6)\xe7\xb3\03.TM\xdd/\xe1\xfb\x86v\xe9\xe6\xba\x92mb+\xbfK1\xf6\x07\xc5\x9d\xa4\t\xf2\xb9\xd3\xa8\xf5\xba\x0eX$DE \xf0uG\xc1-m\xa4\05T\x13\r\x05_"
with open("injection_cert.der", "wb") as f:
    f.write(cert)

This *should* be a binary encoded DER certificate, but of course it doesn't decode properly. Manually adding only the missing bytes yields this:

-----BEGIN CERTIFICATE-----
MIIFBDCCA+ygAwIBAgIQwD4lN4b7UyimO6kNfplrVDANBgkqhkiG9w0BAQsFADBr
MQswCQYDVQQGEwJVUzE8MDoGA1UEChMzSm9obnMgSG9wa2lucyBVbml2ZXJzaXR5
IEFwcGxpZWQgUGh5c2ljcyBMYWJvcmF0b3J5MR4wHAYDVQQDExVKSFVBUEwgU1NM
IEluc3BlY3Rpb24wHhcNMjEwMzI1MDAwMDAwWhcNMjIwMzMwMjM1OTU5WjBmMQsw
CQYDVQQGEwJVUzETMBEGA1UECBMKQ2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZy
YW5jaXNjbzEVMBMGA1UEChMMR2l0SHViLCBJbmMuMRMwEQYDVQQDEwpnaXRodWIu
Y29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAoATthEHhkqMLCHII
idByDiMqz/Vc0+9A56YcyR81J9dcduP+sDSfgpHzlgEKyi/+zTDeGwJ6I1s/6fcE
K2zcTKGUP7PSZq89i+TGDOB4HvwGS3pSrNzSR/mzqcIm0UsnWbMn9jK918S/VJle
2cmzVWJYQ5LdrVdbhpnMj38h99w/jtGmMIAaNIXWfNEiaQTU2+W6wAW/IOp7XInf
+dK3mJmPWYNO0dJBWjDFjxfbMtYwnCo4mDLtC0AOJUADCb3T5fDOoNTsU9iNnJAp
WcBP0wjTmTZlG9p03vxT/Oh6NRHpQ0GrJzvsE873O13xPVNi6qCUsyX8tWKLJ+nc
0wyiewIDAQABo4IBpzCCAaMwJQYDVR0RBB4wHIIKZ2l0aHViLmNvbYIOd3d3Lmdp
dGh1Yi5jb20wDgYDVR0PAQH/BAQDAgOoMBMGA1UdJQQMMAoGCCsGAQUFBwMBMAwG
A1UdEwEB/wQCMAAwHwYDVR0jBBgwFoAUGnQk3SOpf9ERxkX8Uedo+tQB3b8wHQYD
VR0OBBYEFIXPrUGLy/JgfMGM4wrUjHhHtaQIMIIBBQYKKwYBBAHWeQIEAgSB9gSB
8wDxAHYAKXm+8J45OSHwVnOfY6V35b5XfZxgCvj5TV0mXCVdx4QAAAF4ar+9GgAA
BAMARzBFAiEAnuaIRH/8NEWcMk2fq5SGBq7dYy3i9V9jl0aKC6U52NcCIEhUJ9HG
MrW/gXfX6xVorPLI7skBrR/MNAzuyRByRJhZAHcAIkVFB1lVJFaWP6Ev8fdthuAj
JmOtwEt/XcaDXG7iDwIAAAF4ar+9OQAABAMASDBGAiEAmAASSglBGK8GXCjvHrve
hWx/WKnT3payFmqZEK4v8mkCIQDdxfitvfBosMurgLjw1KhSZzDno/A7+ba7CdCm
tv7KHTANBgkqhkiG9w0BAQsFAAOCAQEAXBxuh2MwqK583SXm6SdGPOX80N+W0f7r
/LHIjIJtUDLIpxV5lC7BjqUAJl0bqtQE7v3PEaEuqp99EVOjjNFInESLZmCYLe5F
nBa5MzXbS1wB9FB7hqn+Ux07KFaQ+ZloPsCeOt1zBL7pEseCS12LZB/VeGVyZLLh
8gH+a0jqPJn58W01XgxRyLtdIIpgImbdhLm7Uxb7NrCSjMp9qnc36F+EKKOk/ZYk
eGHg2j372FMzsKk7GogRRNyOsOEVzm3xIP495O4J0vYp57MAAy5UTd0v4fuGdunm
upJtYiu/SzH2B8WdpAnyudOo9boOWCRERSDwdUfBLW2kAAVUEw0FXw==
-----END CERTIFICATE-----

It's nearly identical except for two bytes near the end, which are missing an upper 0x3 byte (byte 0x4d5 is 0x03 and should be 0x33 and byte 0x503 is 0x05 and should be 0x35).

Does this specific cert have some property that is different from the others? Presumably ~all certs have zero bytes in them, so it must be some other thing that causes the parser to fail (or possibly causes a different parser to be called).

For example, could it be ecdsa vs rsa or something?

I don't see anything out of the ordinary. Do you? It looks like a standard RSA certificate to me...

It may or may not be coincidence that 0x30 has special significance in the DER-encoding:

Interesting 🤔. We've escaped the limits of my understanding of TLS 😅 , but that certainly does seem relevant. Maybe an issue with escaping 0x30 bytes / not interpreting the escape properly? It's odd though, that this wouldn't have shown up until now. I feel like I'm grasping at straws as to why the bytestream gets mangled, but it definitely seems to, if the Certificate byte string I lifted is accurate...

benjyw commented 3 years ago

I'm looking at the bytestring in python, which bytes in the original 1283-length byte sequence are malformed? The indexes you mention (0x4d5 and 0x503) appear not to correspond to those bytes (in fact the latter is out of range by 1 byte: 1283==0x503).

benjyw commented 3 years ago

In other words, what are the exact transformations I would need to do to that cert = b'...' byte sequence to get a byte sequence that, when written to a file as you did, would be identical to the real cert?

benjyw commented 3 years ago

Oh NM I can produce this myself, I missed that you posted the "good" cert.

cjntaylor commented 3 years ago

In other words, what are the exact transformations I would need to do to that cert = b'...' byte sequence to get a byte sequence that, when written to a file as you did, would be identical to the real cert?

Eh, it's a good exercise anyways:

Using incremental offsets (so the count is relative after each modification is made), counting from 0x00: It needs a 0x30 inserted after 0x2d, 0x30 after 0x13a, 0x30 after 0x2aa, 0x00 after 0x4d2, 0x4d4 is missing the upper 0x3 (so it should be 0x33 instead of 0x03), 0x00 after 0x500, 0x502 is missing the upper 0x3 (so it should be 0x35 instead of 0x05):

cert = b"0\x82\x05\x040\x82\x03\xec\xa0\x03\x02\x01\x02\x02\x10\xc0>%7\x86\xfbS(\xa6;\xa9\r~\x99kT0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\00k1\x0b0\t\x06\x03U\x04\x06\x13\x02US1<0:\x06\x03U\x04\n\x133Johns Hopkins University Applied Physics Laboratory1\x1e0\x1c\x06\x03U\x04\x03\x13\x15JHUAPL SSL Inspection0\x1e\x17\r210325000000Z\x17\r220330235959Z0f1\x0b0\t\x06\x03U\x04\x06\x13\x02US1\x130\x11\x06\x03U\x04\x08\x13\nCalifornia1\x160\x14\x06\x03U\x04\x07\x13\rSan Francisco1\x150\x13\x06\x03U\x04\n\x13\x0cGitHub, Inc.1\x130\x11\x06\x03U\x04\x03\x13\ngithub.com0\x82\x01\"0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x01\x05\0\x03\x82\x01\x0f\00\x82\x01\n\x02\x82\x01\x01\0\xa0\x04\xed\x84A\xe1\x92\xa3\x0b\x08r\x08\x89\xd0r\x0e#*\xcf\xf5\\\xd3\xef@\xe7\xa6\x1c\xc9\x1f5'\xd7\\v\xe3\xfe\xb04\x9f\x82\x91\xf3\x96\x01\n\xca/\xfe\xcd0\xde\x1b\x02z#[?\xe9\xf7\x04+l\xdcL\xa1\x94?\xb3\xd2f\xaf=\x8b\xe4\xc6\x0c\xe0x\x1e\xfc\x06KzR\xac\xdc\xd2G\xf9\xb3\xa9\xc2&\xd1K'Y\xb3'\xf62\xbd\xd7\xc4\xbfT\x99^\xd9\xc9\xb3UbXC\x92\xdd\xadW[\x86\x99\xcc\x8f\x7f!\xf7\xdc?\x8e\xd1\xa60\x80\x1a4\x85\xd6|\xd1\"i\x04\xd4\xdb\xe5\xba\xc0\x05\xbf \xea{\\\x89\xdf\xf9\xd2\xb7\x98\x99\x8fY\x83N\xd1\xd2AZ0\xc5\x8f\x17\xdb2\xd60\x9c*8\x982\xed\x0b@\x0e%@\x03\t\xbd\xd3\xe5\xf0\xce\xa0\xd4\xecS\xd8\x8d\x9c\x90)Y\xc0O\xd3\x08\xd3\x996e\x1b\xdat\xde\xfcS\xfc\xe8z5\x11\xe9CA\xab';\xec\x13\xce\xf7;]\xf1=Sb\xea\xa0\x94\xb3%\xfc\xb5b\x8b'\xe9\xdc\xd3\x0c\xa2{\x02\x03\x01\0\x01\xa3\x82\x01\xa70\x82\x01\xa30%\x06\x03U\x1d\x11\x04\x1e0\x1c\x82\ngithub.com\x82\x0ewww.github.com0\x0e\x06\x03U\x1d\x0f\x01\x01\xff\x04\x04\x03\x02\x03\xa80\x13\x06\x03U\x1d%\x04\x0c0\n\x06\x08+\x06\x01\x05\x05\x07\x03\x010\x0c\x06\x03U\x1d\x13\x01\x01\xff\x04\x020\00\x1f\x06\x03U\x1d#\x04\x180\x16\x80\x14\x1at$\xdd#\xa9\x7f\xd1\x11\xc6E\xfcQ\xe7h\xfa\xd4\x01\xdd\xbf0\x1d\x06\x03U\x1d\x0e\x04\x16\x04\x14\x85\xcf\xadA\x8b\xcb\xf2`|\xc1\x8c\xe3\n\xd4\x8cxG\xb5\xa4\x080\x82\x01\x05\x06\n+\x06\x01\x04\x01\xd6y\x02\x04\x02\x04\x81\xf6\x04\x81\xf3\0\xf1\0v\0)y\xbe\xf0\x9e99!\xf0Vs\x9fc\xa5w\xe5\xbeW}\x9c`\n\xf8\xf9M]&\\%]\xc7\x84\0\0\x01xj\xbf\xbd\x1a\0\0\x04\x03\0G0E\x02!\0\x9e\xe6\x88D\x7f\xfc4E\x9c2M\x9f\xab\x94\x86\x06\xae\xddc-\xe2\xf5_c\x97F\x8a\x0b\xa59\xd8\xd7\x02 HT'\xd1\xc62\xb5\xbf\x81w\xd7\xeb\x15h\xac\xf2\xc8\xee\xc9\x01\xad\x1f\xcc4\x0c\xee\xc9\x10rD\x98Y\0w\0\"EE\x07YU$V\x96?\xa1/\xf1\xf7m\x86\xe0#&c\xad\xc0K\x7f]\xc6\x83\\n\xe2\x0f\x02\0\0\x01xj\xbf\xbd9\0\0\x04\x03\0H0F\x02!\0\x98\0\x12J\tA\x18\xaf\x06\\(\xef\x1e\xbb\xde\x85l\x7fX\xa9\xd3\xde\x96\xb2\x16j\x99\x10\xae/\xf2i\x02!\0\xdd\xc5\xf8\xad\xbd\xf0h\xb0\xcb\xab\x80\xb8\xf0\xd4\xa8Rg0\xe7\xa3\xf0;\xf9\xb6\xbb\t\xd0\xa6\xb6\xfe\xca\x1d0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\0\x03\x82\x01\x01\0\\\x1cn\x87c0\xa8\xae|\xdd%\xe6\xe9'F<\xe5\xfc\xd0\xdf\x96\xd1\xfe\xeb\xfc\xb1\xc8\x8c\x82mP2\xc8\xa7\x15y\x94.\xc1\x8e\xa5\0&]\x1b\xaa\xd4\x04\xee\xfd\xcf\x11\xa1.\xaa\x9f}\x11S\xa3\x8c\xd1H\x9cD\x8bf`\x98-\xeeE\x9c\x16\xb935\xdbK\\\x01\xf4P{\x86\xa9\xfeS\x1d;(V\x90\xf9\x99h>\xc0\x9e:\xdds\x04\xbe\xe9\x12\xc7\x82K]\x8bd\x1f\xd5xerd\xb2\xe1\xf2\x01\xfekH\xea<\x99\xf9\xf1m5^\x0cQ\xc8\xbb] \x8a`\"f\xdd\x84\xb9\xbbS\x16\xfb6\xb0\x92\x8c\xca}\xaaw7\xe8_\x84(\xa3\xa4\xfd\x96$xa\xe0\xda=\xfb\xd8S3\xb0\xa9;\x1a\x88\x11D\xdc\x8e\xb0\xe1\x15\xcem\xf1 \xfe=\xe4\xee\t\xd2\xf6)\xe7\xb3\03.TM\xdd/\xe1\xfb\x86v\xe9\xe6\xba\x92mb+\xbfK1\xf6\x07\xc5\x9d\xa4\t\xf2\xb9\xd3\xa8\xf5\xba\x0eX$DE \xf0uG\xc1-m\xa4\05T\x13\r\x05_"
fixed_cert = cert[:0x2e] + b"\x30" + cert[0x2e:]
fixed_cert = fixed_cert[:0x13b] + b"\x30" + fixed_cert[0x13b:]
fixed_cert = fixed_cert[:0x2ab] + b"\x30" + fixed_cert[0x2ab:]
fixed_cert = fixed_cert[:0x4d3] + b"\x00" + fixed_cert[0x4d3:]
fixed_cert = fixed_cert[:0x4d4] + b"\x33" + fixed_cert[0x4d5:]
fixed_cert = fixed_cert[:0x501] + b"\x00" + fixed_cert[0x501:]
fixed_cert = fixed_cert[:0x502] + b"\x35" + fixed_cert[0x503:]
with open("server.der", "wb") as f:
    f.write(fixed_cert)

Thats all the modifications. The other certs in the chain also have missing 0x30 and follow similar logic (if you insert them incrementally, the remaining bytes shift and everything falls into place where it's *supposed* to be). So the bytes are mostly getting there, something is just getting mangled.

benjyw commented 3 years ago

Cool, so the one thing left to eliminate is whether the differences are some artifact of the logging output rather than a true representation of a mis-parsed cert. It's unlikely though, this does seem likely to be the underlying problem.

cjntaylor commented 3 years ago

Cool, so the one thing left to eliminate is whether the differences are some artifact of the logging output

Agreed. I've been wondering about this and I don't know how to rule that out

benjyw commented 3 years ago

unfortunately it does look like this is a red herring due to an issue with the debug string of the object, which is what's logged: I wrote a little rust code to load the good cert from a file in to a Certificate object and then writes both its individual bytes (as an array of ints) and its debug string form to stdout. The list of ints is good, but the stdout copy is bad once you read it back in via Python.

benjyw commented 3 years ago

Yeah, I looked at the debug output code, and it emits \0 for the nul byte and 0 for the digit zero. If those appear consecutively then python sees it as \00 and reads that as a nul byte.

So, alas, this is not it.

benjyw commented 3 years ago

OK, regrouping and trying to think of how else to debug.

benjyw commented 3 years ago

To be sure I have this right: the cert posted above is the one with the BadDER error, and we know this because it was logged on error?

If so then I note that this cert has a "Subject Alternative Name" set, so that whole thing is definitely not the problem.

cjntaylor commented 3 years ago

To be sure I have this right: the cert posted above is the one with the BadDER error, and we know this because it was logged on error?

If so then I note that this cert has a "Subject Alternative Name" set, so that whole thing is definitely not the problem.

Correct. The first cert is the exact response the “server” replies with when the code attempts to download pex. The reason I say it that way is that it’s a dynamically generated certificate, from the SSL inspection server. Normally you wouldn’t be able to snag them easily but I yanked it out of a packet capture of the request. I don’t know if the inspection server regenerates certificates for each request - I can try to look into this if we think it’d be helpful.

But yeah, it has a SAN, and it’s well formed on the wire in the response from the server: https://github.com/pantsbuild/pants/issues/12000#issuecomment-831375307

And I can curl the same URL just fine. So that’s the same bundle, just OpenSSL instead of rustls. The cert is okay. It’s something with the rust side itself 🤷

cjntaylor commented 3 years ago

To be sure I have this right: the cert posted above is the one with the BadDER error, and we know this because it was logged on error?

If so then I note that this cert has a "Subject Alternative Name" set, so that whole thing is definitely not the problem.

To clarify. The BadDER error was logged along with a poorly formatted version of that cert at trace level. I simultaneously packet captured the request in wireshark, and then dumped the certificate segment as a DER. We know they match, and we know they’re the same request (I’ve verified it targeted the url).

benjyw commented 3 years ago

Cool. I gotta do other stuff for a bit but will return to this tomorrow. We can't actually test by using this cert as a server cert in the unit test, because we don't have the private key it was signed with. But maybe we can generate a key with the same set of attributes.

benjyw commented 3 years ago

I've tweaked the test cert to have the exact same X509v3 extension values as the problematic one, with the exception of the custom "CT Precertificate SCTs" extension, and things still work. So it may be that the issue is those Precertificate SCTs. I don't see an easy way to generate those in a test cert.

Another angle to try is - do you have information on the vendor of the SSL inspection system that is generating these certs? We might be able to set up a demo system for testing.

benjyw commented 3 years ago

@cjntaylor just checking in to see if we can get any info on the vendor of the SSL inspection system, so we can attempt to reproduce that another way. Right now I may have narrowed the problem down to the Precertificate SCTs, but those are a little tricky to generate and inject into a cert for testing. And we wouldn't have confidence that if we saw an error it was the same error as you're getting, rather than a mistake in encoding the cert.

cjntaylor commented 3 years ago

Apologies for the delay. Right now I have very little information about the SSL Inspection system that is in use, but I will try to find out more. As you might imagine, this is a rather large system as it needs to handle all SSL traffic for the entire network, so it's deployed at a level and by folks I don't normally have access to. That said, I'll see if I can get some more information.

Is there something about the Precertificate SCTs that is causing an issue? I don't know that I've seen them before on other requests, and I'm curious as to why they'd be giving rustls hangups (or why you'd think that'd be the case).

cjntaylor commented 3 years ago

Doing a little bit of digging by backtracing the error (and I'm sure you've done the same), would you agree that the problem I'm getting is that it's erroring right here:

https://github.com/ctz/rustls/blob/ba164d3ba1755fed683885219b5cdb469220dc6c/rustls/src/client/tls12.rs#L709

If I traced it correctly, the BadDer error is caught by this function (chained there):

https://github.com/ctz/rustls/blob/ba164d3ba1755fed683885219b5cdb469220dc6c/rustls/src/client/hs.rs#L773

Which is translating it into the "fatal alert DecodeError" right here

Do I have that right?

cjntaylor commented 3 years ago

@cjntaylor just checking in to see if we can get any info on the vendor of the SSL inspection system

https://www.forcepoint.com/product/ngfw-next-generation-firewall

That doesn't look to be all that helpful. Very enterprise, very inaccessible (and expensive). But I've verified that's actually what is in use. At a minimum though, I think we can be fairly certain that it "does the right thing" for most applications. At least, as much as I'll be able to expect a system/restriction like this to provide.

I think in my case, I'll have to live with the environment it creates. Frustratingly, this is only an issue in the deep internals of what appears to be rustls / webpki for now, so it's hard for me to argue that it's a fundamental issue when anything openssl based works fine.

benjyw commented 3 years ago

Thanks! I'll do a little more poking.

I don't have strong evidence that this has to do with the Precertificate SCTs, other than:

benjyw commented 3 years ago

I suspect you're right about the backtrace and where the error is coming from. But I cannot reproduce the error so it's hard to be certain. But this does give me a couple of ideas.

Eric-Arellano commented 3 years ago

Would it be helpful to ask the Rustls folks to take a look at this thread? They may have ideas for things we're missing.

benjyw commented 3 years ago

I've realized that this is probably not due to Precertificate SCTs as those have been copied verbatim from the original github cert (I checked).

benjyw commented 3 years ago

Plus I can now reproduce the failure in a rustls test which ignores the SCTs.

benjyw commented 3 years ago

Was able to file an issue with a repro against webpki: https://github.com/briansmith/webpki/issues/232

Hopefully they can take it from there, I am now definitely up against my limited knowledge of Rust and of TLS.