lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
9.17k stars 913 forks source link

Unchecked range error on slice. #177

Closed leehambley closed 11 years ago

leehambley commented 11 years ago

I've refactored my code to work around this, but there's an uncheked range error sometimes raising a panic(), when for some reason the list is empty:

https://github.com/lib/pq/blob/master/conn.go#L736

I've added a logging line, and I see that typically something like this:

=== RUN Test_CreatingANewUserSuccessfully
About to read index 0 of [25 25 1043 1043]
About to read index 1 of [25 25 1043 1043]
About to read index 2 of [25 25 1043 1043]
About to read index 3 of [25 25 1043 1043]

About to read index 0 of [2950]

I have seen (but after refactoring, and tidying up on my side, I can''t now replicate) an issue where I see a panic raised, after output like:

=== RUN Test_CreatingANewUserSuccessfully
About to read index 0 of []

I don't quite understand how I could get an empty slice here, I'm passing something like:

    query := "INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid"
  err = store.tx.QueryRow(query, u.Uuid, u.Name, u.Email, password_hash).Scan(&uuid)
    if err != nil {
        return "", err
    }
johto commented 11 years ago

I don't quite understand how I could get an empty slice here, I'm passing something like:

Yeah, me neither unless something was terribly wrong or the protocol state was off.

That said, I think it might be a good idea to check the len(v) against len(st.paramTyps) anyway and throw an error if they don't match. I'm actually quite surprised we don't do that already.

kisielk commented 11 years ago

@leehambley can you include the full panic trace?

leehambley commented 11 years ago

Yeah, me neither unless something was terribly wrong or the protocol state was off.

That's what I assume was happening, I had weird situations sharing a transaction between functions (same goroutine, etc)

@kisielk I will do as soon as I am able to reproduce it.

In the meantime (I really, really don't want to try and reproduce this, as I think it was a result of my bad design decisions) - but I was surprised to be able to make the driver panic.

Would a solution to be to raise a *pq.ProtocolWtf error or something in extreme cases, it might also be a solution to #148 where, if I understood that thread properly it was also an invalid protocol state on the connection?

Throughout debugging this I had a situation where every other (i.e odd live lock, even panic) would lock up (waiting for a deferred *sql.Tx.Commit() which never came)

The former architecture on my side was something like this (if this helps), in the example linked it ALWAYS worked perfectly:

However in a testing environment, with the test calling (on the same *sql.DB, without a Tx) helpers.TeardownDB(), TeardownDB() was issuing a TRUNCATE a, b, c CASCADE; which was often blocking waiting for a SELECT in a transaction (which never was committed, or rolled back). See Below. I believe this TRUNCATE ... CASCADE on the sql.DB in conjunction with a BEGIN... INSERT INTO .... RETURNING uuid; SELECT * FROM .... WHERE ... = $1 .... (defer) COMMIT was the problem. Without the truncation, I cannot reproduce the issue.

In the time that it was failing, here's a bit of an insight into what the pg stats looked like (I don't know how much help this will be):

My test cases, and a demo of how my API loos is now more like this:

func Test_UpdatingAUserSuccessfully(t *testing.T) {

    tx := helpers.GetDbTx(t)
    defer tx.Rollback()

    store := NewDbUserStore(tx, helpers.GetConfig(t))

  u := &domain.User{
    Name:     "Max Mustermann",
    Email:    "max@musterma.nn",
    Password: "changeme123",
  }

  uuid, err := store.Create(u)
  if err != nil {
    t.Fatal(err)
  }

  u, err = store.FindByUuid(uuid)
  if err != nil {
    t.Fatal(err)
  }

  u.Name = "Anne Mustermann"
  err = store.Update(u)
  if err != nil {
    t.Fatalf("failed to save first user: %#v", err)
  }

  u, err = store.FindByUuid(uuid)
  if err != nil {
    t.Fatal(err)
  }
  if u.Name != "Anne Mustermann" {
    t.Fatal("Name was not saved")
  }

}

Notice that now Create() doesn't re-pass the Tx to another function, and simply returns the uuid of the created item, rather than returning it, and accepting a pointer of where to put everything.

func (store DbUserStore) Create(u *domain.User) (string, error) {

    var (
        uuid string
        err  error
    )

    if u == nil {
        return "", errors.New("null pointer")
    }

    if err = domain.ValidateUser(u); err != nil {
        return "", err
    }

    password_hash, err := store.hashPassword(u.Password)
    if err != nil {
        return "", err
    }

    query := "INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid"
    rows, err := store.tx.Query(query, u.Uuid, u.Name, u.Email, password_hash)

    if err != nil {
        return "", resolveErrType(err)
    }
    defer rows.Close()

    if !rows.Next() {
        return "", resolveErrType(rows.Err())
    }

    if err = rows.Scan(&uuid); err != nil {
        return "", resolveErrType(err)
    }

    return uuid, nil
}

The error was more prevalent and easier to reproduce (I think) with a straight QueryRow(), but owing to #77 I'm using Query exclusively now.

For what it's worth, from my understanding of the code, there is no way for pq to return an err from Query(), according to the protocol, and driver it's not possible to return errors until Next() is called on the *Rows (see discussion in #77 for background), so maybe that error check after Query is superfluous.

Lots of info here, incase it can't be fixed for anyone else that might stumble upon the issue, that they might have a shot at working around it in lieu of decent advice/docs on how to correctly use transactions and connections in tests with Go.

As a side perk of refactoring my stores to operate on Tx's rather than accepting the handle, my model has ended up cleaner, and my tests 10x faster (!!!) - so I'm not opposed to just closing the issue, and coming away from things wiser; but I do think it'd be cool if the driver did a range check on the field that is causing this problem, that said - seems weird to add a range check for something when it's not (easily) reproducible.

kisielk commented 11 years ago

Would a solution to be to raise a *pq.ProtocolWtf error or something in extreme cases, it might also be a solution to #148 where, if I understood that thread properly it was also an invalid protocol state on the connection?

How would that help? You are already getting a panic. I think catching any panics from inside the driver code and then returning a generic error would just discard valuable stack trace info and make it harder to track down the bugs.

I'm not sure I understand how transaction stuff in the latter part of your last comment relates to the original problem.

johto commented 11 years ago

How would that help? You are already getting a panic. I think catching any panics from inside the driver code and then returning a generic error would just discard valuable stack trace info and make it harder to track down the bugs.

The problem is that I'm pretty sure panicking from the driver code deadlocks the goroutine if there's a deferred txn.Rollback/Commit; I believe that's what happened to @leehambley, and I'm pretty sure it has happened to me before (unfortunately I've forgotten the details and haven't looked into it after that). So you would still not get a nice stack trace in all cases, you would just completely destroy some poor guy's application.

leehambley commented 11 years ago

The problem is that I'm pretty sure panicking from the driver code deadlocks the goroutine if there's a deferred txn.Rollback/Commit; I believe that's what happened to @leehambley, and I'm pretty sure it has happened to me before (unfortunately I've forgotten the details and haven't looked into it after that). So you would still not get a nice stack trace in all cases, you would just completely destroy some poor guy's application.

That's certainly how it looked, the postgresql query logs shows the INSERT as the last query, (i.e BEGIN.. INSERT ...), but the pg stat activity indicated that the follow-up select had completed, and was waiting for the transaction to be committed.

@kisielk I mention the transaction/truncate stuff, as I believe that the TRUNCATE and transaction stuff were interacting weirdly, I was often seeing (owing to the speed of tests, I assume), on a ~400ms test run, ~3-6 TRUNCATES blocking waiting for the COMMIT that never came on the INSERT INTO .....; SELECT ...... transaction.

I'm very happy to have been able to get away from the issue by rearchitecting, fortunately I'm early enough in my development that I can rework the whole storage layer.

kisielk commented 11 years ago

@johto That is indeed a problem, but I think that could be fixed as well. It seems the error handling just is not very robust right now as we shouldn't be able to get in to that situation.

johto commented 11 years ago

FWIW, I've reproduced the problem I described above, and the stack trace looks like the one I had when I happened to stumble upon this problem: https://gist.github.com/johto/7145291

Deferring a transaction rollback can look quite tempting, so I'm quite sure there's code in the wild that does that. If we ever panic, those people are in for a bad time. I think we ought to find a way to not panic (maybe that's not even possible?), or publicly discourage deferring transaction rollbacks. This also doesn't seem to only affect us, but all database/sql drivers in general.

kisielk commented 11 years ago

I'm not sure why your other goroutine shows it's on line 13? It's still executing the QueryRow?

johto commented 11 years ago

I'm not sure why your other goroutine shows it's on line 13? It's still executing the QueryRow?

I might be wrong here, but the way I interpreted that stack is that the deferred actions are shown to happen on the line the panic() happened, i.e. before unwinding the stack. So the panic causes us to skip the unlock in database/sql here (line 727 in sql.go on my 1.1):

        dc.Lock()
        rowsi, err := queryer.Query(query, dargs)
        dc.Unlock()

and then Tx.Rollback() tries to re-grab the lock on the driver connection, deadlocking the goroutine. I tried to verify my theory by changing database/sql to do something like:

        dc.Lock()
        func() {
            defer dc.Unlock()
            rowsi, err := queryer.Query(query, dargs)
        }()

but I have no idea how to compile my Go program with the modified standard library.

johto commented 11 years ago

OK, with a little help I managed to build it. It does, indeed, make the problem go away, so I think my interpretation of the stack is correct.

kisielk commented 11 years ago

Can you see if it still happens on go tip? I know there were some fixes for race conditions and deadlocks that went in to database/sql in recent times.

kisielk commented 11 years ago

Btw, just noting that most of the conversation seems entirely off topic with the original issue description, which was about st.paramTyps having the wrong length ;)

johto commented 11 years ago

Can you see if it still happens on go tip? I know there were some fixes for race conditions and deadlocks that went in to database/sql in recent times.

Sure, I'll try that.

Btw, just noting that most of the conversation seems entirely off topic with the original issue description, which was about st.paramTyps having the wrong length ;)

Sort of, yes. I didn't mean to divert the attention away from the original problem, but I think this is still at least somewhat on-topic as the way we deal with these problems is the reason we don't have a stack trace to work with in the first place.

johto commented 11 years ago

@leehambley: I've looked through the code you shared, and I don't see anything obviously wrong with it. If you don't mind, it would be great if you could reproduce it, but I understand if you're not up for it.

However (again not related to the issue at hand, sorry about that), this paragraph caught my attention:

Without sticking to a single transaction, there's no guarantee that follow-up calls to QueryRow() or Query() or
Exec() will happen on the same connection, so therefore one is unable to reliably call: INSERT INTO ....
RETURNING uuid; followed shortly by a SELECT * FROM .... WHERE uuid = $1, as on a separate connectiontion
from the pool, the indexes are not guaranteed to have been updated, and thus the SELECT * may fail.

That's not true, at least not for postgres. So long as the database has acknowledged that the INSERTing transaction has committed and nobody has DELETEd (or UPDATEd the row's UUID), the row is guaranteed to be visible to other sessions, with the exception of any SERIALIZABLE/REPEATABLE READ transactions who took their snapshot before the commit.

leehambley commented 11 years ago

I've looked through the code you shared, and I don't see anything obviously wrong with it.

That's reassuring!

That's not true, at least not for postgres. So long as the database has acknowledged that the INSERTing transaction has committed and nobody has DELETEd (or UPDATEd the row's UUID), the row is guaranteed to be visible to other sessions, with the exception of any SERIALIZABLE/REPEATABLE READ transactions who took their snapshot before the commit.

Thanks, then it makes my random failure to SELECT something that successfully that INSERTed all the more strange, lets chalk that up to user error.

Thanks to both of you for taking the issue seriously, and helping figure out the root causes.

leehambley commented 11 years ago
    query := "INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid"
    rows, err := store.tx.Query(query, u.Uuid, u.Name, u.Email, password_hash)
  defer rows.Close()

This locks when failing on a uniqueness violation when the third line (defer rows.Close() is executed) - without the defer rows.Close() - I see this panic:

# The `go test` output for this
=== RUN Test_SessionCreateHandlerStatus400OnMalformedInput
--- PASS: Test_SessionCreateHandlerStatus400OnMalformedInput (0.03 seconds)
=== RUN Test_SessionCreateHandlerStatus201OnSuccess
panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range

goroutine 10 [running]:
--- FAIL: Test_SessionCreateHandlerStatus201OnSuccess (0.00 seconds)
testing.func·004()
    /Users/leehambley/Projects/go/src/pkg/testing/testing.go:348 +0xcd
github.com/lib/pq.errRecover(0x5eda98)
    ~/code//api/src/github.com/lib/pq/error.go:448 +0xda
github.com/lib/pq.(*stmt).exec(0xc2000d0ab0, 0xc200108fc0, 0x4, 0x4)
    ~/code//api/src/github.com/lib/pq/conn.go:736 +0x7b4
github.com/lib/pq.(*conn).Query(0xc2000d16c0, 0x376e70, 0x9a, 0xc200108fc0, 0x4, ...)
    ~/code//api/src/github.com/lib/pq/conn.go:466 +0x1ae
database/sql.(*DB).queryConn(0xc2000ee240, 0xc2000ee3c0, 0x371ba8, 0x376e70, 0x9a, ...)
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:728 +0x196
database/sql.(*Tx).Query(0xc20010bc30, 0x376e70, 0x9a, 0x5edc98, 0x4, ...)
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:1005 +0xbc
myproj/stores.DbUserStore.Create(0xc20010bc30, 0xc2000bbfc0, 0xc2000ee9c0, 0xc2000b7960, 0x0, ...)
    ~/code//api/src/myproj/stores/user_store.go:114 +0x386
myproj/http.setupTestUser(0xc20010bc30, 0xc2000d0900)
    ~/code//api/src/myproj/http/session_handler_test.go:23 +0xf1
myproj/http.Test_SessionCreateHandlerStatus201OnSuccess(0xc2000d0900)
    ~/code//api/src/myproj/http/session_handler_test.go:62 +0x63
testing.tRunner(0xc2000d0900, 0x4be618)
    /Users/leehambley/Projects/go/src/pkg/testing/testing.go:353 +0x8a
created by testing.RunTests
    /Users/leehambley/Projects/go/src/pkg/testing/testing.go:433 +0x86b

goroutine 1 [chan receive]:
testing.RunTests(0x371d88, 0x4be600, 0x5, 0x5, 0x2d5501, ...)
    /Users/leehambley/Projects/go/src/pkg/testing/testing.go:434 +0x88e
testing.Main(0x371d88, 0x4be600, 0x5, 0x5, 0x4c3320, ...)
    /Users/leehambley/Projects/go/src/pkg/testing/testing.go:365 +0x8a
main.main()
    myproj/http/_test/_testmain.go:51 +0x9a

goroutine 2 [syscall]:

goroutine 11 [runnable]:
net/http.(*Server).Serve(0xc2000b79b0, 0xc200108500, 0xc20010be10, 0xc20012f310, 0xf, ...)
    /Users/leehambley/Projects/go/src/pkg/net/http/server.go:1538
created by net/http/httptest.(*Server).Start
    /Users/leehambley/Projects/go/src/pkg/net/http/httptest/server.go:109 +0x21a
exit status 2
FAIL    myproj/http 0.081s

And, the corresponding PostgreSQL logs:

# The First Test (PostgreSQL Log)
postgresql : LOG:  statement: BEGIN
postgresql : LOG:  execute <unnamed>: INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid
postgresql : DETAIL:  parameters: $1 = '11111111-1111-4111-a111-111111111111', $2 = 'Max Mustermann', $3 = 'max@musterma.nn', $4 = '$2a$04$1vTBuEbIEoO/hOJwj7JfuuWNvK/3Vr03601GrfWlgLxaoIGQukTSq'
postgresql : LOG:  statement: ROLLBACK

# The Second Test (PostgreSQL Log)
postgresql : LOG:  statement: BEGIN
postgresql : LOG:  unexpected EOF on client connection with an open transaction

My test suite is deferring rollbacks, but it's been working flawlessly, right up until the point where I actually hit a uniqueness constraint violation.

For better, or worse the method in question looks like this:

func (store DbUserStore) Create(u *domain.User) (string, error) {

    var (
        uuid string
        err  error
    )

    if u == nil {
        return "", errors.New("null pointer")
    }

    if err = domain.ValidateUser(u); err != nil {
        return "", err
    }

    password_hash, err := store.hashPassword(u.Password)
    if err != nil {
        return "", err
    }

    query := "INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid"
    rows, err := store.tx.Query(query, u.Uuid, u.Name, u.Email, password_hash)
    //defer rows.Close()

    if err != nil {
        return "", resolveErrType(err)
    }

    if !rows.Next() {
        return "", resolveErrType(rows.Err())
    }

    if err = rows.Scan(&uuid); err != nil {
        return "", resolveErrType(err)
    }

    return uuid, nil
}

I have no idea if that's the correct order for which error checking should be done in this driver, as there's no complete examples of how to use Query safely (i.e given that I can't use QueryRow).

Is the problem here that the deferred ROLLBACK is somehow breaking the driver so that the subsequent Query will not work? I'm totally at a loss as to why I keep running into this.

Thanks for all the support.

leehambley commented 11 years ago

For what it's worth, with all Commit() and Rollback() removed from anywhere in my codebase, the error now bubbles up to the restful (go-restful) layer where it gets caught:

=== RUN Test_UserCreateHandlerStatus201OnSuccess
Creating First User
leehambley: decding params got 0, [25 25 1043 1043]
leehambley: decding params got 1, [25 25 1043 1043]
leehambley: decding params got 2, [25 25 1043 1043]
leehambley: decding params got 3, [25 25 1043 1043]
Creating Second User
leehambley: decding params got 0, []
2013/10/27 15:47:59 [restful] recover from panic situation: - runtime error: index out of range
    /Users/leehambley/Projects/go/src/pkg/runtime/panic.c:229
    /Users/leehambley/Projects/myproj/api/src/github.com/lib/pq/error.go:448
    /Users/leehambley/Projects/go/src/pkg/runtime/panic.c:229
    /Users/leehambley/Projects/go/src/pkg/runtime/panic.c:487
    /Users/leehambley/Projects/go/src/pkg/runtime/panic.c:442
    /Users/leehambley/Projects/myproj/api/src/github.com/lib/pq/conn.go:737
    /Users/leehambley/Projects/myproj/api/src/github.com/lib/pq/conn.go:466
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:728
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:1005
    /Users/leehambley/Projects/myproj/api/src/myproj/stores/user_store.go:115
    /Users/leehambley/Projects/myproj/api/src/myproj/http/user_handler.go:48
    /Users/leehambley/Projects/myproj/api/src/myproj/http/user_handler.go:20
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/container.go:179
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/filter.go:21
    /Users/leehambley/Projects/myproj/api/src/myproj/http/http.go:80
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/filter.go:19
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/container.go:181
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/container.go:68
    /Users/leehambley/Projects/go/src/pkg/net/http/server.go:1149
    /Users/leehambley/Projects/go/src/pkg/net/http/server.go:1416
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/container.go:199
    /Users/leehambley/Projects/myproj/api/src/github.com/emicklei/go-restful/compress.go:0
    /Users/leehambley/Projects/go/src/pkg/net/http/httptest/server.go:200
    /Users/leehambley/Projects/go/src/pkg/net/http/server.go:1517
    /Users/leehambley/Projects/go/src/pkg/net/http/server.go:1096
    /Users/leehambley/Projects/go/src/pkg/runtime/proc.c:1223

--- FAIL: Test_UserCreateHandlerStatus201OnSuccess (0.02 seconds)
    user_handler_test.go:59: Expected StatusCreated, got: 500 &{0xc200135000 {0 0} false <nil> 0xd7f00 0xd7eb0}
FAIL
exit status 1
FAIL    myproj/http 0.060s

That output is from the following modification to pq's conn.go:

index 7af8188..7d8ab4b 100644
--- a/conn.go
+++ b/conn.go
@@ -733,6 +733,7 @@ func (st *stmt) exec(v []driver.Value) {
                if x == nil {
                        w.int32(-1)
                } else {
+                       fmt.Printf("leehambley: decding params got %d, %v\n", i, st.paramTyps)
                        b := encode(x, st.paramTyps[i])
                        w.int32(len(b))
                        w.bytes(b)

The PostgreSQL log looks like this:

postgresql : LOG:  statement: BEGIN
postgresql : LOG:  execute <unnamed>: INSERT INTO users (uuid, name, email, password_hash) VALUES (CASE WHEN $1::text = '' THEN uuid_generate_v4() ELSE $1::uuid END, $2, $3, $4) RETURNING uuid
postgresql : DETAIL:  parameters: $1 = '', $2 = '', $3 = 'max@musterma.nn', $4 = '$2a$04$BoLamgkZZOJKwILUSq3ijOZUDP72Q0.JBgGh8/NYVfaMu12G5ecSK'
postgresql : LOG:  unexpected EOF on client connection with an open transaction

If someone can tell me what I am doing wrong, or how to work around this, I'd be very grateful!

johto commented 11 years ago

Can you show us a capture of the network traffic between the application and the postgres server?

leehambley commented 11 years ago

I'd be happy to, how do you recommend capturing it (tcpdump, any special/recommended settings you can suggest?, I'm on OSX.)

johto commented 11 years ago

tcpdump -i lo0 -s 65535 -w capture tcp port 5432 should work, I think

leehambley commented 11 years ago

@johto Binary file at http://cl.ly/40252y0B3t11

I can't get a useful text output without missing important information, I'm afraid. Also since upgrading my machine, I haven't needed X11, so I can't verify that there's anything useful in the file.

I'm installing X11 (for wireshark to understand the capture) and I'll compare the results with a manual session in the pgsql REPL in the morning.

johto commented 11 years ago

@leehambley Thanks, this is exactly what I needed. I'll have a closer look at it a bit later.

leehambley commented 11 years ago

Cheers @johto - don't hesitate to let me know if there's anything else I can do/provide.

johto commented 11 years ago

@leehambley: The network trace looks exactly as it should. I can't form any theory on how this could possibly happen.

If you look at the contents of the ParameterDescription message in prepareToSimpleStmt (case 't', line 413 in current HEAD), what does the data look like?

leehambley commented 11 years ago

For what it's worth I was able to distil this to the following example code, I'll report back in a few minutes with the values you asked for @johto:

package main

// Given the following table structure, the following code snippet raises
// a `panic: runtime error: index out of range [recovered]' on the second
// insert.

// CREATE TABLE users (
//   uuid                 uuid                     CONSTRAINT users_pkey PRIMARY KEY DEFAULT uuid_generate_v4(),
//   email                varchar(254)             NOT NULL UNIQUE,
//   name                 text                     NOT NULL,
//   password_hash        varchar(60)              NOT NULL DEFAULT '',
// )

// Some sample output:

// 2013/10/28 20:32:35 Opening Connection
// 2013/10/28 20:32:35 Truncating Users Table
// 2013/10/28 20:32:35 Opening Transaction
// 2013/10/28 20:32:35 Inserting User
// 2013/10/28 20:32:35 Inserting User
// panic: runtime error: index out of range [recovered]
//   panic: runtime error: index out of range

// goroutine 1 [running]:
// github.com/lib/pq.errRecover(0x3cbaf0)
//   /Users/leehambley/.go/src/github.com/lib/pq/error.go:448 +0xda
// github.com/lib/pq.(*stmt).exec(0xc20009e120, 0xc200089870, 0x3, 0x3)
//   /Users/leehambley/.go/src/github.com/lib/pq/conn.go:736 +0x7b4
// github.com/lib/pq.(*conn).Query(0xc20009f240, 0x216050, 0x52, 0xc200089870, 0x3, ...)
//   /Users/leehambley/.go/src/github.com/lib/pq/conn.go:466 +0x1ae
// database/sql.(*DB).queryConn(0xc2000cf060, 0xc2000cf1e0, 0x216f68, 0x216050, 0x52, ...)
//   /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:728 +0x196
// database/sql.(*Tx).Query(0xc2000897b0, 0x216050, 0x52, 0x3cbc68, 0x3, ...)
//   /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:1005 +0xbc
// main.insertUser(0xc2000897b0, 0xc200089840, 0x24)
//   /Users/leehambley/Projects/harrow/api/reproduce_panic.go:22 +0x21d
// main.main()
//   /Users/leehambley/Projects/harrow/api/reproduce_panic.go:61 +0x56e

// goroutine 2 [syscall]:
// exit status 2

// The PostgreSQL Query Log

// postgresql : LOG:  statement: TRUNCATE users CASCADE;
// postgresql : LOG:  statement: BEGIN
// postgresql : LOG:  execute <unnamed>: INSERT INTO users (name, email, password_hash) VALUES ($1, $2, $3) RETURNING uuid;
// postgresql : DETAIL:  parameters: $1 = 'Max Mustermann', $2 = 'max@musterma.nn', $3 = '$2a$04$V0d4OvP/W.xYiUdF33eLpenYn7IaPy283aoHV5cIFFA7pXHgNyZIe'
// postgresql : LOG:  unexpected EOF on client connection with an open transaction

import (
    "database/sql"
    _ "github.com/lib/pq"
    "log"
)

const (
    name          string = "Max Mustermann"
    email         string = "max@musterma.nn"
    password_hash string = "$2a$04$V0d4OvP/W.xYiUdF33eLpenYn7IaPy283aoHV5cIFFA7pXHgNyZIe"
)

var insert string = "INSERT INTO users (name, email, password_hash) VALUES ($1, $2, $3) RETURNING uuid;"

func insertUser(tx *sql.Tx) string {

    var uuid string = ""

    log.Println("Inserting User")
    rows, err := tx.Query(insert, name, email, password_hash)

    if err != nil {
        log.Fatal("Error Inserting User", err)
    }

    if !rows.Next() {
        log.Fatal("No Rows Returned From User Insert")
    }

    if err = rows.Scan(&uuid); err != nil {
        log.Fatal("Couldn't Read Returned UUID", err)
    }

    return uuid

}

func main() {

    log.Println("Opening Connection")
    db, err := sql.Open("postgres", "dbname=go_harrow_test sslmode=disable")
    if err != nil {
        log.Fatal("Error Opening Database Conection", err)
    }

    log.Println("Truncating Users Table")
    _, err = db.Exec("TRUNCATE users CASCADE;")
    if err != nil {
        log.Fatal("Couldn't Truncate Users Table")
    }

    log.Println("Opening Transaction")
    tx, err := db.Begin()
    if err != nil {
        log.Fatal("Error Opening Tranasction", err)
    }

    insertUser(tx)
    insertUser(tx)

    tx.Commit()

}
leehambley commented 11 years ago

So, the log extended now:

2013/10/28 20:52:47 Opening Connection
2013/10/28 20:52:47 Truncating Users Table
2013/10/28 20:52:47 Opening Transaction
2013/10/28 20:52:47 Inserting User
2013/10/28 20:52:47 lib/pq (7876026/#409): t( 49 ) r( &[] )
2013/10/28 20:52:47 lib/pq (7876026/#409): t( 116 ) r( &[0 3 0 0 0 25 0 0 4 19 0 0 4 19] )
2013/10/28 20:52:47 lib/pq (7876026/#414): nparams( 3 )paramTyps( [0 0 0] )
2013/10/28 20:52:47 lib/pq (7876026/#409): t( 84 ) r( &[0 1 117 117 105 100 0 0 1 206 131 0 1 0 0 11 134 0 16 255 255 255 255 0 0] )
2013/10/28 20:52:47 lib/pq (7876026/#409): t( 90 ) r( &[84] )
2013/10/28 20:52:47 Inserting User
2013/10/28 20:52:47 lib/pq (7876026/#409): t( 67 ) r( &[73 78 83 69 82 84 32 48 32 49 0] )
panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range

goroutine 1 [running]:
github.com/lib/pq.errRecover(0x3ccaf0)
    /Users/leehambley/.go/src/github.com/lib/pq/error.go:448 +0xda
github.com/lib/pq.(*stmt).exec(0xc20009e1b0, 0xc200089a80, 0x3, 0x3)
    /Users/leehambley/.go/src/github.com/lib/pq/conn.go:739 +0x7b4
github.com/lib/pq.(*conn).Query(0xc20009f240, 0x2163f0, 0x52, 0xc200089a80, 0x3, ...)
    /Users/leehambley/.go/src/github.com/lib/pq/conn.go:469 +0x1ae
database/sql.(*DB).queryConn(0xc2000c7060, 0xc2000c71e0, 0x217308, 0x2163f0, 0x52, ...)
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:728 +0x196
database/sql.(*Tx).Query(0xc2000897b0, 0x2163f0, 0x52, 0x3ccc68, 0x3, ...)
    /Users/leehambley/Projects/go/src/pkg/database/sql/sql.go:1005 +0xbc
main.insertUser(0xc2000897b0, 0xc200089a50, 0x24)
    /Users/leehambley/Projects/harrow/api/reproduce_panic.go:70 +0x21d
main.main()
    /Users/leehambley/Projects/harrow/api/reproduce_panic.go:109 +0x56e

goroutine 2 [syscall]:
exit status 2

Given the following diff in the conn.go at SHA:7876026:

diff --git a/conn.go b/conn.go
index 7af8188..8232ba4 100644
--- a/conn.go
+++ b/conn.go
@@ -59,6 +59,7 @@ import (
    "fmt"
    "github.com/lib/pq/oid"
    "io"
+   "log"
    "net"
    "os"
    "path"
@@ -407,11 +408,13 @@ func (cn *conn) prepareToSimpleStmt(q, stmtName string) (_ *stmt, err error) {

    for {
        t, r := cn.recv1()
+       log.Println("lib/pq (7876026/#409): t(", t, ") r(", r, ")")
        switch t {
        case '1', '2', 'N':
        case 't':
            nparams := int(r.int16())
            st.paramTyps = make([]oid.Oid, nparams)
+           log.Println("lib/pq (7876026/#414): nparams(", nparams, ")paramTyps(", st.paramTyps, ")")

            for i := range st.paramTyps {
                st.paramTyps[i] = r.oid()

I don't know how much this helps @johto but I appreciate the discourse.

johto commented 11 years ago

This reproduces the issue, but only because the Rows object isn't closed in insertUser().

johto commented 11 years ago

Which, unless I'm mistaken, would also explain the issue. The next prepareToSimpleStmt() stops processing if it sees a CommandComplete -- not sure why exactly, but it does. That's why the statement's paramTyps is empty: we never processed the 't' message from the postgres server.

Does that make sense to your real-world application?

leehambley commented 11 years ago

Thanks for the clarification @johto - perhaps that's what is causing the error in my app?

I assumed that rows would be safely cleaned up, and that given that rows is two separate instances of the same variable?

I'm not sure of the implications of the t message being processed, or not. But given that maybe my 2nd query is happening before the first set of rows are closed, it's a possibility.

I'm slightly unclear on where, and how it is safe to close rows, I had been deferring it in my app code.

leehambley commented 11 years ago

This version of the insertUser() function doesn't panic, as you alluded to:

func insertUser(tx *sql.Tx) string {

    var uuid string = ""

    log.Println("Inserting User")
    rows, err := tx.Query(insert, name, email, password_hash)
    defer rows.Close()

    if err != nil {
        log.Fatal("Error Inserting User", err)
    }

    if !rows.Next() {
        log.Fatal("No Rows Returned From User Insert", rows.Err())
    }

    if err = rows.Scan(&uuid); err != nil {
        log.Fatal("Couldn't Read Returned UUID", err)
    }

    return uuid

}

The log:

go install github.com/lib/pq && go run reproduce_panic.go
2013/10/28 21:07:49 Opening Connection
2013/10/28 21:07:49 Truncating Users Table
2013/10/28 21:07:49 Opening Transaction
2013/10/28 21:07:49 Inserting User
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 49 ) r( &[] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 116 ) r( &[0 3 0 0 0 25 0 0 4 19 0 0 4 19] )
2013/10/28 21:07:49 lib/pq (7876026/#414): nparams( 3 )paramTyps( [0 0 0] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 84 ) r( &[0 1 117 117 105 100 0 0 1 206 131 0 1 0 0 11 134 0 16 255 255 255 255 0 0] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 90 ) r( &[84] )
2013/10/28 21:07:49 Inserted User, Database Granted UUID: 17612cae-2e07-4494-95a6-faa1ec0fb6fe
2013/10/28 21:07:49 Inserting User
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 49 ) r( &[] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 116 ) r( &[0 3 0 0 0 25 0 0 4 19 0 0 4 19] )
2013/10/28 21:07:49 lib/pq (7876026/#414): nparams( 3 )paramTyps( [0 0 0] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 84 ) r( &[0 1 117 117 105 100 0 0 1 206 131 0 1 0 0 11 134 0 16 255 255 255 255 0 0] )
2013/10/28 21:07:49 lib/pq (7876026/#409): t( 90 ) r( &[84] )
2013/10/28 21:07:49 No Rows Returned From User Insertpq: duplicate key value violates unique constraint "users_email_key"
exit status 1
make: *** [buildnrun] Error 1
johto commented 11 years ago

Thanks for the clarification @johto - perhaps that's what is causing the error in my app?

Seems plausible.

I assumed that rows would be safely cleaned up, and that given that rows is two separate instances of the same variable?

That's not how it currently works, unfortunately.

I'm not sure of the implications of the t message being processed, or not. But given that maybe my 2nd query is happening before the first set of rows are closed, it's a possibility.

I'm slightly unclear on where, and how it is safe to close rows, I had been deferring it.

Right, you really need to make sure that the previous rows handle is closed before starting a new Query().

leehambley commented 11 years ago

That's not how it currently works, unfortunately.

I presume this has something to do with Close() sending some commands to free some resources (row cursors, etc) on the server, and not doing this leaves the driver in an unstable state, so it's not just about Go-land GC, it's really a hard requirement to get things Close()d, in order not to break the driver?

Right, you really need to make sure that the previous rows handle is closed before starting a new Query().

Does that apply globally (i.e per *sql.DB), or per transaction (i.e per *sql.Tx).

johto commented 11 years ago

I presume this has something to do with Close() sending some commands to free some resources (row cursors, etc) on the server, and not doing this leaves the driver in an unstable state, so it's not just about Go-land GC, it's really a hard requirement to get things Close()d, in order not to break the driver?

More accurately, it leaves the protocol in an unstable state. Unfortunately the code in prepareToSimpleStmt seems a bit sloppy and does not realize that the response from the server isn't actually what was expected at all.

Does that apply globally (i.e per sql.DB), or per transaction (i.e per sql.Tx).

I might be talking out of my rear, but I think if a connection has an open rows object, it's not "free to use", so running another query on the sql.DB will just open a new connection, and should work (though if you never rows.Close(), you're leaking connections). However, with sql.Tx being already tied to a connection, you'll run into the problem we saw here.

leehambley commented 11 years ago

More accurately, it leaves the protocol in an unstable state. Unfortunately the code in prepareToSimpleStmt seems a bit sloppy and does not realize that the response from the server isn't actually what was expected at all.

Great, thanks for the clarification

I might be talking out of my rear, but I think if a connection has an open rows object, it's not "free to use", so running another query on the sql.DB will just open a new connection, and should work (though if you never rows.Close(), you're leaking connections). However, with sql.Tx being already tied to a connection, you'll run into the problem we saw here.

Actually, even with explicit row.Close()s I'm still running into locking, although as previously discussed, this may be due to Go lang's testing, or http frameworks using goroutines, and sharing the Tx, however the code is remarkably similar, and I can mostly only simulate a lock in my application code.

For what it's worth @johto if you have the time, I'd be very grateful to invite you to look at my app (trivial, ~800 LOC inc tests) to try and figure out what is going wrong, and help me give something back to the community. I'd be more than happy to pay you whatever your consulting rate is, if it means we can improve lib/pq, figure out if my use-case is sane and try and figure out what's going so wrong.

I saw your comment on #89 - I'm not familiar enough with the protocol internals to understand the effect of that small commit to which you are referring, perhaps I should read up on the postgresql wire protocol.

johto commented 11 years ago

Actually, even with explicit row.Close()s I'm still running into locking, although as previously discussed, this may be due to Go lang's testing, or http frameworks using goroutines, and sharing the Tx, however the code is remarkably similar, and I can mostly only simulate a lock in my application code.

For what it's worth @johto if you have the time, I'd be very grateful to invite you to look at my app (trivial, ~800 LOC inc tests) to try and figure out what is going wrong, and help me give something back to the community. I'd be more than happy to pay you whatever your consulting rate is, if it means we can improve lib/pq, figure out if my use-case is sane and try and figure out what's going so wrong.

I'd be happy to look at it for free, but I can't promise anything before last week -- I'll leave for pgconf.eu tomorrow. But if you share the code with me, I might be able to find some time between the talks.

leehambley commented 11 years ago

@johto thanks, I appreciate that - no time pressure from my side - is the best email to get you on m****.tii*****@cs.hel****.i* ? (sorry for weird obfuscation, I don't want to set you up to spam bots)

I'm starting a new client project on Wednesday, so I'm in no particular hurry, in the meantime I can work on the parts of my app that don't rely on exceptions being raised in the DB (i.e the validations)

johto commented 11 years ago

@leehambley No, I don't have access to that email anymore. You can use [my first name]@joh.to.

johto commented 11 years ago

@leehambley Appears to have located the issue, and it was not an issue with pq. There's still the problem I described in #89 which I'd like to follow up on, but this issue, I believe, can be closed.

Again publicly: thanks to @leehambley for helping us rule out a bug in pq as the cause of this issue.

msakrejda commented 11 years ago

Thanks @johto and @leehambley. #180 is merged now so closing this.

leehambley commented 11 years ago

Awesome, thanks! On 9 Nov 2013 18:17, "Maciek Sakrejda" notifications@github.com wrote:

Closed #177 https://github.com/lib/pq/issues/177.

— Reply to this email directly or view it on GitHubhttps://github.com/lib/pq/issues/177 .

stengaard commented 11 years ago

I am still seeing this.

Test case:


package main
import (
    "fmt"
    "database/sql"
    _ "github.com/lib/pq"
)

func main() {
    db, err := sql.Open("postgres", "database=radius")
    if err != nil {
        return
    }
    email := "somestring"

    rows, err := db.Query(`SELECT id FROM users WHERE email = '$1';`, email)
    fmt.Println(rows, err)
}

Result:

panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range

goroutine 1 [running]:
runtime.panic(0x1c1b20, 0x3fdc97)
    /usr/local/go/src/pkg/runtime/panic.c:266 +0xb6
github.com/lib/pq.errRecover(0x4d7cf8)
    /Users/brian/go/src/github.com/stengaard/pqfail/go/src/github.com/lib/pq/error.go:448 +0xd0
runtime.panic(0x1c1b20, 0x3fdc97)
    /usr/local/go/src/pkg/runtime/panic.c:248 +0x106
github.com/lib/pq.(*stmt).exec(0xc21004bab0, 0xc21000ab70, 0x1, 0x1)
    /Users/brian/go/src/github.com/stengaard/pqfail/go/src/github.com/lib/pq/conn.go:802 +0xdec
github.com/lib/pq.(*conn).Query(0xc21000b900, 0x239710, 0x28, 0xc21000ab70, 0x1, ...)
    /Users/brian/go/src/github.com/stengaard/pqfail/go/src/github.com/lib/pq/conn.go:482 +0x1b1
database/sql.(*DB).queryConn(0xc210068000, 0xc210038c00, 0xc21000ab60, 0x239710, 0x28, ...)
    /usr/local/go/src/pkg/database/sql/sql.go:926 +0x1b9
database/sql.(*DB).query(0xc210068000, 0x239710, 0x28, 0x4d7ef8, 0x1, ...)
    /usr/local/go/src/pkg/database/sql/sql.go:913 +0xdd
database/sql.(*DB).Query(0xc210068000, 0x239710, 0x28, 0x4d7ef8, 0x1, ...)
    /usr/local/go/src/pkg/database/sql/sql.go:899 +0x8b
main.main()
    /Users/brian/go/src/github.com/stengaard/pqfail/main.go:18 +0x1e2

goroutine 3 [chan receive]:
database/sql.(*DB).connectionOpener(0xc210068000)
    /usr/local/go/src/pkg/database/sql/sql.go:574 +0x3e
created by database/sql.Open
    /usr/local/go/src/pkg/database/sql/sql.go:436 +0x24d

goroutine 4 [syscall]:
runtime.goexit()
    /usr/local/go/src/pkg/runtime/proc.c:1394

Version info:

brian@stengaard:pq $ uname -a
Darwin stengaard.local 13.0.0 Darwin Kernel Version 13.0.0: Thu Sep 19 22:22:27 PDT 2013; root:xnu-2422.1.72~6/RELEASE_X86_64 x86_64
brian@stengaard:pq $ git log | head -n1 # latest pq master
commit 7454a9817b0f148ab5b18b1d5007b0c2b5fe8984
brian@stengaard:pq $ psql --version
psql (PostgreSQL) 9.1.9
contains support for command-line editing
brian@stengaard:pq $ go version
go version go1.2rc3 darwin/amd64
brian@stengaard:pq $

Test is run with:

#!/bin/bash -x
cleanup() {
    pg_ctl -D data stop
    rm -rf data server.crt go
    echo "Exit"
}
trap cleanup EXIT

mkdir -p $(pwd)/go/src $(pwd)/go/bin $(pwd)/go/pkg
export GOPATH=$(pwd)/go

# create a test cert
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt
chmod 600 server.*

# create empty db
initdb data
cp server.* data
echo "ssl = true" >> data/postgresql.conf
#
pg_ctl -D data -l sqllog start
sleep 2

createdb radius

psql radius <<EOF
create table if not exists users (
    id          bigserial primary key,
    name        varchar(253),
    email       varchar(253) unique,
    admin       bool,
    first_login timestamp with time zone not null,
    last_login  timestamp with time zone not null
);
EOF

go get github.com/lib/pq

go run main.go

Any pointers?

johto commented 11 years ago
rows, err := db.Query(`SELECT id FROM users WHERE email = '$1';`, email

Your statement doesn't have any parameters because you're comparing the email column against the literal "$1".

It would be nice to not panic here, though. I'll see how this can be improved.

leehambley commented 11 years ago

I was just about to make the same observation. Also it's kinda poor form to post a code example complaining about unchecked errors, when you haven't checked for errors, it took me a couple of minutes to read through your posted code to determine which line was panicking!

stengaard commented 11 years ago

You are absolutely right. Sorry about that.