Closed mikedilger closed 8 years ago
This sounds pretty bad - does it seem like the connections are never returned to the pool at all? The Debug implementation for Pool should print the number of connections it thinks it has - what does it say?
It could also be that something broke in Condvar::wait_timeout
- does it seem like Pool::get
is blocking for as long as you'd expect it to?
Pool::get is blocking for the 5 seconds or so before it gives the GetTimeout error.
How do I get the debug output? I'm compiling and running the debug binary, and also setting RUST_LOG=r2d2=trace,r2d2_postgres=trace
but I'm not seeing anything.
Sorry, debug as in println!("{:?}", pool)
.
Could you also try running the r2d2 tests on nightly? They pass for me, and it'd be good to see if that's the case for you as well.
Yeah, the idle_connections keep dropping. Not every time though.
Tests pass on my machine too.
Also I watch my threads return and they definitely complete.
if I count the postgres processes from ps
output that say "idle", they remain at 30 (what I started with), e.g. 30 like this:
postgres 12178 2780 0 16:01 ? 00:00:00 postgres: psf_php UPSF_core ::1(58506) idle
This is hard to reproduce. I've got a simpler version of my program (a project it forked from) compiled on nightly where this does not happen. So I can narrow this down by moving those projects towards each other. Still might be my code somewhere somehow, so don't stress about this yet.
Ok here is what I have found out so far.
let conn = try2!(row.get());
I never seem to have an issue.let conn = &*try2!(row.get());
then some functions seem to fail to drop the resource properly when they return.... not all functions, and I haven't determined what conditions exactly differentiate these cases.Here is an example function that was causing the problem:
fn accessible_sites(app: &App, psuser: &Psuser) -> Result<Vec<i32>, Error>
{
let sql = "SELECT site,name FROM person_site JOIN site ON person_site.site=site.id \
JOIN person ON person_site.person=person.id \
WHERE person_site.status='active' AND person.user=$1 \
AND person_site.site IN (SELECT id FROM site WHERE status='active') \
AND person_site.approved=true \
UNION \
SELECT id,name FROM site WHERE \"user\"=$2 AND status='active' \
ORDER BY name";
debug!("CONN +d");
let conn = &*try2!(app.conn());
let stmt = try2!(conn.prepare( sql ));
let rows = try2!(stmt.query( &[&psuser.id, &psuser.id] ));
let mut sites: Vec<i32> = Vec::new();
for row in &rows {
if row.len() < 1 { continue; }
sites.push( row_get!(row, 0) );
};
debug!("CONN -d");
return Ok(sites);
}
app.conn() is defined as
fn conn(&self) -> Result<PgConn, GetTimeout> {
println!("POOL: {:?}", self.pool);
self.pool.get()
}
where PgConn was defined as
// Shorthand for ridiculously long types
pub type PgPool = Pool<PostgresConnectionManager>;
pub type PgConn = PooledConnection<PostgresConnectionManager>;
I can work around this issue, but this appears to be a compiler bug.
Yep, that seems like a codegen bug. I wonder if it's caused by the MIR switch?
How is try2
defined?
Pretty sure the same thing would happen with try!(), but here is how it is defined:
/// A macro for trying an operation, and on failure, extending the error with file,
/// line and message information
///
/// Call with try2!( expression-yielding-result , ErrorType, "message")
#[macro_export]
macro_rules! try2 {
($expr:expr, $msg:expr ) => (
match $expr {
Ok(val) => val,
Err(err) => fail2!(err, $msg)
}
);
($expr:expr) => (
try2!($expr, "")
)
}
/// A macro for extending an error with file, line, and message
///
/// Call with fail2!(e, ErrorType, "message")
#[macro_export]
macro_rules! fail2 {
($err:expr, $msg:expr) => ({
let e: ::joist_error::Error = From::from($err);
return Err( e.file(file!()).line(line!()).prefix_message($msg) )
});
($err:expr) => (
fail2!($err, "")
)
}
Yep, verified with try!(), issue remains.
If you know which version of rust introduced the MIR switch, I can test the version prior to that.
I'm not sure off the top of my head, but I think it was some time in the last month or so?
Since this seems like a codegen issue, we should probably move it to rust-lang/rust where the compiler folks will see it.
I believe the upstream miscompilation issue has been fixed.
I'm getting some very strange behavior. My code works fine on a particular commit (pool used thousands of times, reclaims resources as connections fall out of scope). Then when I make the minimal changes necessary to upgrade to rust nightly (from 6-14 to 8-10) the (nearly) same code runs out of database connections quickly, even if I jack up the number of connections from my default 5 to any larger number, it will eventually consume them all.
The only changes I made to cause this to happen are:
serde
,serde_macros
,serde_json
(from 7 to 8)regex_macros
(0.1.37 to 0.1.38)maud_macros
(some commit to 0.9.2)bincode
(0.5 to 0.6)handlebars
(0.17 to 0.20)Notice, no upgrades to
r2d2
,postgres
, orr2d2-postgres
I've spent about 6 hours on this today. Tomorrow I'll see if a simple example exhibits the same behavior, then try bisecting rust-nightly versions.