ndmitchell / hoogle

Haskell API search engine
http://hoogle.haskell.org/
Other
737 stars 132 forks source link

Insecure downloading of data #78

Open abacabadabacaba opened 10 years ago

abacabadabacaba commented 10 years ago

Hoogle downloads data from the Internet to build its databases. Currently, it does so in such a way that an attacker capable of performing a MITM attack can use it to access sensitive data, elevate privileges or cause denial of service.

See here, here and here for details about the attacks.

Blaisorblade commented 9 years ago

From this point of view, hoogle should first refuse to run as root — that's totally a bad idea. As you pointed out, that's not enough, but that's an important mitigation.

Right now, haskell.org serves (some of) the relevant data through HTTPS (although Hoogle does not ask for it). Would sticking to HTTPS address your concerns?

abacabadabacaba commented 9 years ago

Using HTTPS is a good idea, but it is not enough: even a fully compromised server should not be able to compromise the user's account, not even in collusion with another (unpriviliged) local user.

Blaisorblade commented 9 years ago

To be sure: if I understand correctly, you're implying hoogle to be more secure than wget + tar, right? I can imagine reasons (the user is not supposed to use wget + tar without extra precautions), but I'm curious to check my understanding. I imagine a solution would be to either sandbox tar (argh) or use a tar library and implement checks (which ones). Possibly switching to another format if appropriate.

At the same time, even using HTTPS isn't easy — it is using wget, but in #92 we're discussing switching to a Haskell library; however, it's hard to see an obviously secure and convenient Haskell TLS option (the binding to OpenSSL is harder to install, especially on Windows, while the native Haskell implementation hasn't been audited for the vulnerabilities which are still possible in Haskell — timing attacks seem the biggest problem, since the impact of compiler optimizations isn't obvious). Since you have some security competence, can you join us over there if you have an opinion?

abacabadabacaba commented 9 years ago

IMO, tar is not well designed security-wise. For example, just unpacking two untrusted archives one after another into the same empty directory is enough to let an attacker overwrite any file (the first archive would contain a symlink to a file that is to be overwritten, and the second would contain the new contents of that file). So right, the user does need extra precautions when using wget + tar.

As far as I know, it is safe to extract an untrusted tar archive if some precautions are taken:

Tar's info page contains a section on security which lists all the points above and some more. Again, I think it is a consequence of bad design that such precautions have to be taken.

Instead of using tar, it may be possible to distribute the data in a format that can be parsed entirely from Haskell. The tar format is itself pretty simple, but a specialized format can be even simpler.

If wget doesn't work with some HTTPS website which works fine with other clients, this is most probably caused by some configuration problem such as outdated list of trusted root certificates. For me, wget can download https://www.haskell.org/ just fine.

I'd strongly advise against using any uncommon TLS implementations. TLS protocol is notoriously difficult to implement securely, and new vulnerabilities are discovered now and then. In particular, this implementation doesn't seem to be protected against timing attacks even without taking compiler optimizations into account.

vincenthz commented 9 years ago

@abacabadabacaba Can you be specific (line number, issues that you found possibly with benchmark showing some timing skew, vague code looking dodgy, etc) ?

Blaisorblade commented 9 years ago

@vincenthz Your question is of course legitimate, but let's please agree that the burden of proof of security is on you, and it's an extraordinary burden in general.

And from the outside, security against timing attacks in the face of GHC's optimizations looks like a research project — maybe even a PhD thesis, for all I know (which isn't much) — one needs to examine all compiler optimizations + runtime system.

To be sure, Hoogle doesn't need confidentiality of the traffic, only integrity. But unless hs-tls guarantees confidentiality of the (symmetric) key material, I believe even integrity can be compromised.

(Arguably, vincenthz/hs-tls#89 or similar could be a better place, so that we don't spam Hoogle developers with notifications).

vincenthz commented 9 years ago

@Blaisorblade I haven't made any claims of security here (nor that I'm here to convince you to use one particular library) . The fact that you (and @abacabadabacaba ) are focusing on timing attacks [1] whereas they are so many things that can go fundamentally wrong (think Heartbleed, think apple crypto, basic logic errors, Denial of service, etc) tells me that you're not having an honest thinking about this. I'm all for skepticism in the face of cryptography and security, and anyone should doubt security stuff (that include openssl, which seems as usual to get a free pass here), but there's no point repeating common "knowledge" out of the internet about cryptography and trying to pass it as an educated thought.

For further comments, please move to the tls issue. Sorry if I sound harsh, but despite this, I'm also very interested in audits (which doesn't have to be only audits from expert, I do welcome it at any level; i.e they don't have to be right, but at least they should point to actual things that may look not right or exploitable, etc).

sorry, @ndmitchell about the spam.

[1] which are extremely hard to exploit, especially in real world conditions. Also not all timing attacks are made equal, a symmetric padding issue is very different (impact wise) than a Bleichenbacher one, or even worse a RSA decryption skew.

ndmitchell commented 9 years ago

In Hoogle 5 I get everything over https and don't pass anything through tar, so I think I'm pretty safe. If someone injects malicious code they could return invalid Hoogle results, and potentially induce someone to visit a website, but that's the extent of it. And I'm doing the https download using the conduit-tls package which I think ultimately backs on to the Haskell tls (I think) :)