tursodatabase / libsql-client-go

Go client API for libSQL
MIT License
166 stars 23 forks source link

implement websocket auto reconnect #41

Closed sahidrahman404 closed 1 year ago

sahidrahman404 commented 1 year ago

Currently there is no support for reconnecting websocket after it's dropped by the server. This commit adds timeOpen property to track when the client is connected to the server, and will reconnect the connection if request's age is more than 60 seconds.

haaawk commented 1 year ago

Thank you for your contribution @sahidrahman404.

I'm not sure this is the right way of fixing the problem though. If the server drops the connection when a transaction is open, it is not save to just open new connection and continue the transaction.

Another thing is that reconnecting every 1 minute seems a bit wasteful. If we the connection is still good why not use it and open a new one?

sahidrahman404 commented 1 year ago

Hey thanks for your feedback. That's makes sense, what do you think is the best way to check if the server has dropped the connection?

haaawk commented 1 year ago

Hey thanks for your feedback. That's makes sense, what do you think is the best way to check if the server has dropped the connection?

I'm not 100% sure but maybe catching the WebSocket closed: failed to read frame header: EOF error and reconnecting then.

sahidrahman404 commented 1 year ago

@haaawk Hi I just create an implementation like you said above. But I also add something a bit hacky, I don't know if I can do this or not? You can see my explanation about that in the commit description.

sahidrahman404 commented 1 year ago

Just realized doing this halve context timeout shenanigans can be a problem

haaawk commented 1 year ago

Would you have a reliable reproducer for the problem @sahidrahman404 ?

sahidrahman404 commented 1 year ago

Sure. Give me some time. I just woke up

sahidrahman404 commented 1 year ago

https://github.com/sahidrahman404/gigachad-api/tree/can't-close-writer-reproduction

Hey you can clone the above repo on your machine.

Add DB_DSN env to your turso account or just edit the connection string on the main file line 71.

    db, err := database.New(cfg.db.dsn)
    if err != nil {
        return err
    }
    defer db.Db.Close()

Run the app with make run/live. Do the post request like this.

    BODY='{"username": "smith", "name": "Edith Smith", "email": "edith@example.com", "password": "pa55word"}'
    curl -d "$BODY" http://localhost:4444/v1/users

Wait for 8 minutes. And do the above request again. You will get the below error on your terminal log because they try to close the writer but can't due to the websocket connection that has been closed by the servers. So we will hit the timeout limit before we he hit the re-connection logic.

    error from wsjson write:  failed to write JSON message: failed to close writer: failed to write fin frame: failed to write frame: context deadline exceeded
    error from wsjson write:  failed to write JSON message: failed to marshal JSON: failed to write: failed to acquire lock: context deadline exceeded
sahidrahman404 commented 1 year ago

I forget you also have to have Atlas on your machine for your migration and change the url on atlas.hcl file to your turso account.

And run the migration with the below command

atlas schema apply --env local --to file://schema.hcl
haaawk commented 1 year ago

Ok. Let me take over this issue. I believe this shouldn't happen to new databases that now are suspended after 24h of inactivity not after 5minutes but it's still worth fixing the behaviour.

haaawk commented 1 year ago

I did some experimentation and was able to reproduce the problem with a db being suspended after 10s of inactivity and the following code:

        db, err := sql.Open("libsql", dbUrl)
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to open db %s: %s", dbUrl, err)
        os.Exit(1)
    }
    conn, err := db.Conn(context.Background())
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to get db connection %s: %s", dbUrl, err)
        os.Exit(1)
    }
    ctx := context.Background()
    _, err = conn.QueryContext(ctx, "SELECT 1")
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to execute query: %s", err)
        os.Exit(1)
    }
    time.Sleep(15 * time.Second)
    _, err = conn.QueryContext(ctx, "SELECT 1")
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to execute query: %s", err)
        os.Exit(1)
    }
failed to execute query: failed to read JSON message: failed to get reader: failed to read frame header: EOF

but with a bit different code everything works just fine:

        db, err := sql.Open("libsql", dbUrl)
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to open db %s: %s", dbUrl, err)
        os.Exit(1)
    }
    ctx := context.Background()
    _, err = db.QueryContext(ctx, "SELECT 1")
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to execute query: %s", err)
        os.Exit(1)
    }
    time.Sleep(15 * time.Second)
    _, err = db.QueryContext(ctx, "SELECT 1")
    if err != nil {
        fmt.Fprintf(os.Stderr, "failed to execute query: %s", err)
        os.Exit(1)
    }

I start to thing that we don't have any issue here. If the connection is terminated by the server then the client should just recreate the connection. High level Golang database API already does that automatically.

sahidrahman404 commented 1 year ago

That's great. But I still encounter this problem.

failed to write JSON message: failed to close writer: failed to write fin frame: failed to write frame: context deadline exceeded

But if I did this, everything works as exepected.

        db.SetConnMaxIdleTime(1 * time.Minute)

I think this should be in the documentation.

haaawk commented 1 year ago

This should fix the problem @sahidrahman404 -> https://github.com/libsql/libsql-client-go/pull/44 Thank you for reporting the issue and providing the reproducer. After updating libsql-client-go in gigachad-api, it stopped reporting errors in my local environment.