Open sainslie opened 7 years ago
This is an interesting idea. I hadn't considered this leak vector before you also mentioned it in the eotk issue.
Are you thinking about situations in which a site serves different certificates for a clearnet domain and onion domain, and includes the hashes for both in a single HPKP header which then gets served to any client regardless of origin?
Maybe I have too much faith in how people configure their webservers, but I'm guessing that anyone who decides to do the above will be aware that they are potentially linking the clearnet and onion site's identities together by doing it.
So I expect it would be a really small subset of hosts that this revealed anything interesting for. You'd need a few conditions in place for it to occur:
I guess there's also some potentially identifying information in the HPKP header if the admin has pinned one of the intermediary CA's rather than the clearnet site's leaf cert itself, but it would be much harder to draw any definite conclusions from that, as many clearnet sites probably have the same CA pinned.
Would be interesting to see some data though, if only just to see how many sites are being served with such an odd configuration.
My gut feeling on this is that if a host is vulnerable to hostname hacking then it will likely have other configuration issues .
That being said, the number of hosts using TLS is really small, and the number of cases where that is compromising is even smaller.
Now that we shouldn't extract and check for this data - we should, like we do for all TLS certificate data - but I have a feeling that hostname hacking is a far more likely vector than HPKP correlation.
@ajhaydock It bears the potential for identification regardless of if @apache or @nginx includes the data for the @TheTorProject domain name or not as using the clear-text is insightful enough for potential correlation if data-mining is used against public databases @google maintains.
@ajhaydock @s-rah I agree that the circumstances and the specific configuration are so uncommon that the likelihood of it happening is almost non-existent but considering the data is unique it still has potential as actionable data. I'm curious about @alecmuffett thoughts too.
@ietf RFC 7469 is a useful mechanism to stipulate X.509 credentials for a specific domain name and its X.509 certificate chain in @apache @nginx or @Microsoft Internet Information Services.
I'm curious if it might also permit identification of a clear-text host-name if it's being used to host both clear-text material and material through @TheTorProject and the Secure Hash Algorithm 2 output is identified through comparing a database of clear-text domain names and their subsequent Secure Hash Algorithm 2 output from @apache.
It might be useful to extract it and add it to a local database if it exists. I'm unsure if @google has accessible public databases that are able to be searched for subsequent matches but it might be useful to add this. It's @ietf RFC 6962.