googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.77k stars 1.29k forks source link

spanner: original errors masked by `errInlineBeginTransactionFailed()` #7903

Closed bmhatfield closed 9 months ago

bmhatfield commented 1 year ago

Client

e.g. Spanner

Environment

e.g. Mac OSX + Cloud Spanner

Go Environment

$ go version: go version go1.20.4 darwin/arm6

Code

e.g.

_, err := client.client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
        statement := spanner.Statement{
            SQL:    sql.String(), // CONTAINS INVALID SQL, development error
            Params: sql.Params(),
        }

        n, txErr := transaction.Update(ctx, statement)
        logger.Error(txErr)

        return txErr
})

Expected behavior

An error message describing that the SQL is malformed

Actual behavior

Errors are masked with spanner: code = "Internal", desc = "failed inline begin transaction"

Additional context

e.g. Started after upgrading to cloud.google.com/go/spanner v1.44.0, v1.45.1 still broken.

This makes debugging mistakes very difficult. This used to provide excellent context about what was broken.

Manually logging the original error shows that it's simply masked:

[bhatfield spanner@v1.45.1]% diff transaction.go transaction.go.bak
21d20
<   "fmt"
1053d1051
<           fmt.Println("--------------- SPANNER ERROR", err)
--------------- SPANNER ERROR rpc error: code = AlreadyExists desc = Failed to insert row with primary key ({...removed...}) due to previously inserted row
bmhatfield commented 1 year ago

I'd like to bump that this is really painful for us. Sometimes we use this client to apply schema migrations, and they fail without telling us what went wrong, which makes debugging what the state of the database is very difficult.

This is a pretty easy fix, would a PR be accepted here?

ajorgensen commented 1 year ago

+1. I just ran into this as well and had to make the change @bmhatfield suggested to see what the underlying error was.

I got

failed inline begin transaction

but the actual underlying error ended up being

Value has type INT64 which cannot be inserted into column created_at, which has type TIMESTAMP [at 6:5]
akallu commented 10 months ago

Also ran into this issue. It for some reason only happens when calling from inside a goroutine. It's a dead-simple UPDATE I am trying to run. No idea what's going on :/

randalmoore commented 10 months ago

What I recently discovered may be useful.

I because aware of this issue investigating something that seems very similar. I eventually discovered that the problem was due to how I was propagating the Spanner error from within the Spanner transaction.

My Situation:

Changed an existing service. Existing integration test broke. It was expecting ALREADY_EXISTS but received INTERNAL error with message spanner: code = "Internal", desc = "failed inline begin transaction".

My Details

The original code was similar to this:

_, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, tx *spanner.ReadWriteTransaction) error {
        n, err := transaction.Update(ctx, statement) // statement that violates index constraint
        return transformError(err) // transformed error must still contain spanner.Error
})

ReadWriteTransaction retries the "failed inline begin transaction" error if it can detect that it occurs. But the code transformed the error such that spanner.Error was no longer in the error chain, causing ReadWriteTransaction to return the Internal error. Removing transformError or including spanner.Error in the error chain produced the expected behavior.

ajorgensen commented 10 months ago

@randalmoore there’s actually an open set of PRs with a fix but there has been little traction on getting it merged but feel free to check out the changes here: https://github.com/harshachinta/google-cloud-go/pull/5

rahul2393 commented 10 months ago

Thanks @ajorgensen @randalmoore, I am taking a look to the changes right now, I tried the snippet attached in the issue here using v1.54.0

_, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
        statement := spanner.Statement{
            SQL:    sql.String(), // CONTAINS INVALID SQL, development error
            Params: sql.Params(),
        }
        n, txErr := transaction.Update(ctx, statement)
        if txErr != nil {
            log.Printf("Failed to update: %v", txErr)
            return txErr
        }
        log.Printf("Number of rows updated: %v", n)
        return nil
})
if err != nil {
    log.Printf("Failed to update: %v", txErr)
}

When running above its prints this output

Failed to update: spanner: code = "Internal", desc = "failed inline begin transaction"
Failed to update: spanner: code = "AlreadyExists", desc = "Row [1] in table Singers already exists"
Failed with err: spanner: code = "AlreadyExists", desc = "Row [1] in table Singers already exists"

So first error can be ignored since its just telling us that it will retry the request with explicit begin transaction, since with inline it failed, expectation is client library will retry the request with explicit begin call and then it will return the original error.

Similarly when I tried inserting invalid value to a column printed

2024/01/09 19:28:52 Failed to update: spanner: code = "Internal", desc = "failed inline begin transaction"
2024/01/09 19:28:53 Failed to update: spanner: code = "InvalidArgument", desc = "Value has type INT64 which cannot be inserted into column....                                                  
2024/01/09 19:28:54 Failed with err: spanner: code = "InvalidArgument", desc = "Value has type INT64 which cannot be inserted into column ...  

@bmhatfield I have also tested above with v1.47.0 and the behaviour is same, eventually it returns the error which is human readable.

rahul2393 commented 10 months ago

@akallu Can you share the snippet to replicate the issue?

ajorgensen commented 10 months ago

@rahul2393 based on my experience we haven't seen the client exhibit this behavior: "eventually it returns the error which is human readable." In my experience the client has always swallowed the result of the inline transaction and simply returned the fact that it failed.

I will take a look at v1.540 today to see if i can reproduce what you are seeing. I do see what you are saying about the repeated errors in the logs though and agree that's not the best. If i can verify the behavior that I am seeing still exists in v1.54.0 I can look at the changeset and see if there's a way to only log the error once the retries have been exhausted which I believe will address your concern over the repeated log messages.

rahul2393 commented 10 months ago

I will create a PR to avoid failed inline begin transaction since internally we just do the retry for transaction when it fails with that message, and should be treated as warning only.

ajorgensen commented 10 months ago

@rahul2393 the following test case:

module github.com/ajorgensen/spanner-test

go 1.21.5

require (
    cloud.google.com/go/spanner v1.54.0
    google.golang.org/api v0.149.0
    google.golang.org/grpc v1.59.0
)
package main

import (
    "context"
    "fmt"
    "log"

    "cloud.google.com/go/spanner"
    "google.golang.org/api/option"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
)

func main() {
    dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")

    opts := []option.ClientOption{
        option.WithEndpoint("localhost:9010"),
        option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
        option.WithoutAuthentication(),
    }

    client, err := spanner.NewClient(context.Background(), dbURI, opts...)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    _, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
        statement := spanner.Statement{
            SQL:    "insert foo into bar", // CONTAINS INVALID SQL, development error
            Params: map[string]interface{}{},
        }

        n, txErr := transaction.Update(ctx, statement)
        log.Fatal(n, txErr)

        return txErr
    })

    if err != nil {
        log.Fatal(err)
    }

    log.Printf("DONE")
}

will return:

2024/01/09 11:17:01 0 spanner: code = "Internal", desc = "failed inline begin transaction"
exit status 1

However replacing the package in go.mod:

module github.com/ajorgensen/spanner-test

go 1.21.5

replace cloud.google.com/go => github.com/ajorgensen/google-cloud-go v0.0.0-20231204193647-c4784e6a26a6

replace cloud.google.com/go/spanner => github.com/ajorgensen/google-cloud-go/spanner v0.0.0-20231204193647-c4784e6a26a6

Results in:

2024/01/09 11:27:39 0 spanner: code = "InvalidArgument", desc = "Syntax error: Unexpected INSERT target name [at 1:12]\ninsert foo into bar\n           ^, failed inline begin transaction"
exit status 1

Which shows the underlying issue with the query. Are you able to see the same behavior? I'm not sure I follow "and should be treated as warning only" since it seems to have some clear information about an unrecoverable failure with the underlying query that would be very useful in debugging.

I think critically for the example above, this commit is what allowed the underlying error to be surfaced https://github.com/googleapis/google-cloud-go/commit/c0dd2d130db6eef3982d62724f6d2847edf2be03

bmhatfield commented 10 months ago

I have an additional question based upon @ajorgensen 's reproduce case and @rahul2393 's comment:

internally we just do the retry for transaction when it fails with that message

I'm not sure why retrying would be appropriate when this message is returned, as the error is masking over an actual, non-retryable problem. Why would retrying help?

rahul2393 commented 10 months ago
  log.Fatal(n, txErr)

cc: @ajorgensen is the reason for this, using Fatal will cause the program to exit, causing it to halt and not to do any retries client library is expected to do, Can you please try adding log.Error or print and see what you see.

I'm not sure why retrying would be appropriate when this message is returned, as the error is masking over an actual, non-retryable problem. Why would retrying help?

@bmhatfield there are various types of errors Spanner backend can return, instead of adding the logic on when to retry and not to retry for the failsafe we added to default retry if request is inlineBegin and first operation to Spanner fails(Unavailable/Aborted etc/HTTP gone..), request upon failure will be retried with explicit BeginTransaction.

ajorgensen commented 10 months ago

@rahul2393 yes I can confirm the on v1.54.0 I am seeing the underlying error with the following test case. Thanks for your help here!

Your comments on the retry logic gave me a hint at what was going on. Consider the following example:

package main

import (
    "context"
    "fmt"
    "log"

    "cloud.google.com/go/spanner"
    "google.golang.org/api/option"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
    "google.golang.org/grpc/status"
)

func main() {
    dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")

    opts := []option.ClientOption{
        option.WithEndpoint("localhost:9010"),
        option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
        option.WithoutAuthentication(),
    }

    client, err := spanner.NewClient(context.Background(), dbURI, opts...)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    _, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
        statement := spanner.Statement{
            SQL:    "insert foo into bar", // CONTAINS INVALID SQL, development error
            Params: map[string]interface{}{},
        }

        _, txErr := transaction.Update(ctx, statement)
        if txErr != nil {
            return convertToStatusError(txErr)
        }

        return nil
    })

    if err != nil {
        log.Fatal(convertToStatusError(err))
    }

    log.Printf("DONE")
}

func convertToStatusError(err error) error {
    if err == nil {
        return nil
    }

    return status.Error(spanner.ErrCode(err), spanner.ErrDesc(err))
}
2024/01/09 14:15:20 rpc error: code = Internal desc = rpc error: code = Internal desc = failed inline begin transaction
exit status 1

Specifically, this case is slightly modifying the error within the Transaction function by returning the error wrapped in a grpc error. When run against v.15.4 this will exhibit the same behavior we are seeing where the underlying error message is completely masked. I'm guessing any modification to the error returned by Update is causing some issue in the client where it cant be unwrapped correct, I'm not sure the exact mechanism here.

However, interestingly if we just return the error and then call convertToStatusError outside of the transaction, we get the full details:

package main

import (
    "context"
    "fmt"
    "log"

    "cloud.google.com/go/spanner"
    "google.golang.org/api/option"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
    "google.golang.org/grpc/status"
)

func main() {
    dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")

    opts := []option.ClientOption{
        option.WithEndpoint("localhost:9010"),
        option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
        option.WithoutAuthentication(),
    }

    client, err := spanner.NewClient(context.Background(), dbURI, opts...)
    if err != nil {
        log.Fatal(err)
    }

    ctx := context.Background()

    _, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
        statement := spanner.Statement{
            SQL:    "insert foo into bar", // CONTAINS INVALID SQL, development error
            Params: map[string]interface{}{},
        }

        _, txErr := transaction.Update(ctx, statement)
        if txErr != nil {
            return txErr
        }

        return nil
    })

    if err != nil {
        log.Fatal(convertToStatusError(err))
    }

    log.Printf("DONE")
}

func convertToStatusError(err error) error {
    if err == nil {
        return nil
    }

    return status.Error(spanner.ErrCode(err), spanner.ErrDesc(err))
}
2024/01/09 14:15:07 rpc error: code = InvalidArgument desc = Syntax error: Unexpected INSERT target name [at 1:12]
insert foo into bar
           ^
exit status 1

It is interesting to me that wrapping the error in this way either loses the details or makes it so they aren't carried forward but either way I think I can make a modification on our side to work around this.

rahul2393 commented 9 months ago

Closing this since its working as intended.