Closed larskanis closed 11 years ago
Original comment by Michael Granger (Bitbucket: ged, GitHub: ged).
I moved your (excellently formatted, and awesome) report to the ticket body myself.
I do Github, at least as a mirror, but until it supports Mercurial for version control, it's not an option for the main project site.
Anyway, thanks for the excellent report. I'll take a look this weekend.
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
After a longer debugging session, I was able to track down the cause for this wired behavior. My test environment is:
The OpenSSL library maintains a thread local error queue that is filled on any error that occurs within the library. Ruby's openssl extension and libpq both use the same error queue, when running in the same thread. Most errors from in the openssl extension are read out and sent as an Exception, but this is not true for Certificate#verify
. It only returns true or false and leaves OpenSSL's error queue untouched.
On the other hand libpq uses SSL_get_error()
to retrieve the state of the SSL connection. See [1] and [3]. This call does not inspect the thread error queue in case of blocking behavior. So PG::Connection#exec
does not raise an error (at least for the simple test case). When using the async call the error queue becomes inspected and an Exception is raised for the previous verification error.
Simple solution is to use OpenSSL.errors
after each failed certificate verification. Since the comment to OpenSSL.errors
[2] says, that this is probably an error in the openssl extension, I'm not sure, whether something should be fixed there.
Possibly libpq should be changed in a way, that OpenSSL's thread local error queue is cleared, before every SSL operation, as described in [1].
I don't think that anything is wrong in ruby-pg.
What do you think?
[1] http://www.openssl.org/docs/ssl/SSL_get_error.html
[2] https://github.com/ruby/ruby/blob/trunk/ext/openssl/ossl.c#L349
Original comment by Caio Chassot (Bitbucket: kch, GitHub: kch).
Thanks for taking a look.
Matt Zimmermann has looked into this on our end too and arrived at the same conclusion, albeit his suggestion would be to fix this in ruby's openssl library.
I think as far as ruby-pg is concerned you can close this. Let me know if you'd like to be kept in the loop as we file reports with ruby and/or postgres.
Original comment by Caio Chassot (Bitbucket: kch, GitHub: kch).
To be clear, we haven't filed any reports yet.
It seems you understand the issue in more detail than we do, and could do a more thorough report. If you could please go ahead and file the issues, that would be great.
I'll then ask Zimmermann to comment on them if he has anything to add, and if need be, talk to our postgres and ruby-core contacts to make sure it gets some attention.
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
You sad that Matt Zimmermann did a suggestion to fix this in ruby's openssl library. Did he talk about how it should be fixed? I currently see these variants:
Certificate#verify
instead of true/false - This will change the API in an incompatible way, so it will be no real option.Certificate#verify
- So there will be no way to get the particular error text. Maybe add another method in the way as 1.OpenSSL.errors
after a failed call to Certificate#verify
.I don't really like any of these, right now.
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
OK, thank you! I don't know, if you tested the Matt's patch, but it should have no effect to your sample, since that code path isn't used there. Nevertheless the change to libpq would be similar - to add ERR_clear_error()
in fe-secure.c
.
Original comment by Matt Zimmerman (Bitbucket: mdz, GitHub: mdz).
@larskanis my preference is to always check the return code of the call to find out if there was an error, and then use SSL_get_error to get more detail iff the call failed. Relying on the error queue to determine success or failure seems more complex to me, and creates the possibility of bugs like this one (which are impossible if the return code is checked).
Original comment by Michael Granger (Bitbucket: ged, GitHub: ged).
While I generally agree that checking the return code is better, clearing the queue is less change for all of the cases in question, and I think the heuristic of changing as little logic as possible when submitting patches for other peoples' code trumps the error-check. Just my 2 cents.
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
OK, there is a bug report on ruby-redmine, too .
Original comment by Michael Granger (Bitbucket: ged, GitHub: ged).
PostgreSQL folks rejected the patch, but the Ruby issue is still open.
Original comment by Eric Woodruff (Bitbucket: ericwoodruff, GitHub: ericwoodruff).
larskanis, is "Add a note in the documentation that suggest the user should call OpenSSL.errors after a failed call to Certificate#verify" still valid? Just to be sure, it is not OpenSSL.errors.clear or something?
Original comment by kritik (Bitbucket: kritik, GitHub: kritik).
the same with ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux] And what I have found it's present only in one method (I cannot read from database). Everything works in previus method.
UPD: Ok after deep testing I've found that async_exec (error: #<PG::ConnectionBad: PQconsumeInput() SSL error: wrong signature length>) is not working here, however exec works ok
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
@kritik This issue is still open and still not fixed in ruby-openssl: https://bugs.ruby-lang.org/issues/7215
This is not an issue in ruby-pg, so there is no reason to reopen this ticket. async_exec() is effected by this issue and exec() not, but this is only an implementation detail.
Without a change in ruby-openssl, you should generally call OpenSSL.errors after each Certificate#verify. This is also wise in order to retrieve the reason, why the certificate isn't valid.
Original comment by darkcloud (Bitbucket: darkcloud, GitHub: darkcloud).
We were invoking Dnsruby::Dnssec.validate within Rails and encountering this issue. Invoking OpenSSL.errors when #validate raises exceptions solved it for us. Thank-you everyone for the deep debugging, elegant explanation and of course the workaround.
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
Peter Geoghegan from heroku.com finally convinced the PostgreSQL team to fix this issue on their side. So it is on master branch, in 9.6-beta1 and will be backported to all maintained versions: https://commitfest.postgresql.org/9/520/
Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).
It took some time, but in the end this issue has been solved on both the Ruby side and the PostgreSQL side .
Original report by Caio Chassot (Bitbucket: kch, GitHub: kch).
Where cert is an instance of OpenSSL::X509::Certificate, and conn an instance of PGconn connected with sslmode:'require', calling conn.async_exec after a call to cert.verify(key) will break with an SSL Error from PG, but only if the verify call returned false.
Some sample code here: https://gist.github.com/9f341c7a8662cf6d9c84 (There's a comment at the bottom with the output I get when running on ubuntu.)
This sample code will always fail with the following eror:
Here are a few other messages I got earlier before reducing this to a test script:
The varying messages lead me to believe the particular message is unimportant, but that there's some state/data corruption going on.