Closed bmhatfield closed 9 months ago
I'd like to bump that this is really painful for us. Sometimes we use this client to apply schema migrations, and they fail without telling us what went wrong, which makes debugging what the state of the database is very difficult.
This is a pretty easy fix, would a PR be accepted here?
+1. I just ran into this as well and had to make the change @bmhatfield suggested to see what the underlying error was.
I got
failed inline begin transaction
but the actual underlying error ended up being
Value has type INT64 which cannot be inserted into column created_at, which has type TIMESTAMP [at 6:5]
Also ran into this issue. It for some reason only happens when calling from inside a goroutine. It's a dead-simple UPDATE I am trying to run. No idea what's going on :/
What I recently discovered may be useful.
I because aware of this issue investigating something that seems very similar. I eventually discovered that the problem was due to how I was propagating the Spanner error from within the Spanner transaction.
Changed an existing service. Existing integration test broke. It was expecting ALREADY_EXISTS but received INTERNAL error with message spanner: code = "Internal", desc = "failed inline begin transaction"
.
tx.Update(ctx, statement)
inside of a ReadWriteTransaction
.UNIQUE violation on index XXX, duplicate key XXX
to receive spanner error with gRPC code 6 Already Exists
Internal
.spanner: code = "Internal", desc = "failed inline begin transaction"
The original code was similar to this:
_, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, tx *spanner.ReadWriteTransaction) error {
n, err := transaction.Update(ctx, statement) // statement that violates index constraint
return transformError(err) // transformed error must still contain spanner.Error
})
ReadWriteTransaction
retries the "failed inline begin transaction" error if it can detect that it occurs. But the code transformed the error such that spanner.Error
was no longer in the error chain, causing ReadWriteTransaction
to return the Internal error. Removing transformError
or including spanner.Error
in the error chain produced the expected behavior.
@randalmoore there’s actually an open set of PRs with a fix but there has been little traction on getting it merged but feel free to check out the changes here: https://github.com/harshachinta/google-cloud-go/pull/5
Thanks @ajorgensen @randalmoore,
I am taking a look to the changes right now, I tried the snippet attached in the issue here using v1.54.0
_, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
statement := spanner.Statement{
SQL: sql.String(), // CONTAINS INVALID SQL, development error
Params: sql.Params(),
}
n, txErr := transaction.Update(ctx, statement)
if txErr != nil {
log.Printf("Failed to update: %v", txErr)
return txErr
}
log.Printf("Number of rows updated: %v", n)
return nil
})
if err != nil {
log.Printf("Failed to update: %v", txErr)
}
When running above its prints this output
Failed to update: spanner: code = "Internal", desc = "failed inline begin transaction"
Failed to update: spanner: code = "AlreadyExists", desc = "Row [1] in table Singers already exists"
Failed with err: spanner: code = "AlreadyExists", desc = "Row [1] in table Singers already exists"
So first error can be ignored since its just telling us that it will retry the request with explicit begin transaction, since with inline it failed, expectation is client library will retry the request with explicit begin call and then it will return the original error.
Similarly when I tried inserting invalid value to a column printed
2024/01/09 19:28:52 Failed to update: spanner: code = "Internal", desc = "failed inline begin transaction"
2024/01/09 19:28:53 Failed to update: spanner: code = "InvalidArgument", desc = "Value has type INT64 which cannot be inserted into column....
2024/01/09 19:28:54 Failed with err: spanner: code = "InvalidArgument", desc = "Value has type INT64 which cannot be inserted into column ...
@bmhatfield I have also tested above with v1.47.0 and the behaviour is same, eventually it returns the error which is human readable.
@akallu Can you share the snippet to replicate the issue?
@rahul2393 based on my experience we haven't seen the client exhibit this behavior: "eventually it returns the error which is human readable." In my experience the client has always swallowed the result of the inline transaction and simply returned the fact that it failed.
I will take a look at v1.540 today to see if i can reproduce what you are seeing. I do see what you are saying about the repeated errors in the logs though and agree that's not the best. If i can verify the behavior that I am seeing still exists in v1.54.0 I can look at the changeset and see if there's a way to only log the error once the retries have been exhausted which I believe will address your concern over the repeated log messages.
I will create a PR to avoid failed inline begin transaction
since internally we just do the retry for transaction when it fails with that message, and should be treated as warning only.
@rahul2393 the following test case:
module github.com/ajorgensen/spanner-test
go 1.21.5
require (
cloud.google.com/go/spanner v1.54.0
google.golang.org/api v0.149.0
google.golang.org/grpc v1.59.0
)
package main
import (
"context"
"fmt"
"log"
"cloud.google.com/go/spanner"
"google.golang.org/api/option"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func main() {
dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")
opts := []option.ClientOption{
option.WithEndpoint("localhost:9010"),
option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
option.WithoutAuthentication(),
}
client, err := spanner.NewClient(context.Background(), dbURI, opts...)
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
_, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
statement := spanner.Statement{
SQL: "insert foo into bar", // CONTAINS INVALID SQL, development error
Params: map[string]interface{}{},
}
n, txErr := transaction.Update(ctx, statement)
log.Fatal(n, txErr)
return txErr
})
if err != nil {
log.Fatal(err)
}
log.Printf("DONE")
}
will return:
2024/01/09 11:17:01 0 spanner: code = "Internal", desc = "failed inline begin transaction"
exit status 1
However replacing the package in go.mod
:
module github.com/ajorgensen/spanner-test
go 1.21.5
replace cloud.google.com/go => github.com/ajorgensen/google-cloud-go v0.0.0-20231204193647-c4784e6a26a6
replace cloud.google.com/go/spanner => github.com/ajorgensen/google-cloud-go/spanner v0.0.0-20231204193647-c4784e6a26a6
Results in:
2024/01/09 11:27:39 0 spanner: code = "InvalidArgument", desc = "Syntax error: Unexpected INSERT target name [at 1:12]\ninsert foo into bar\n ^, failed inline begin transaction"
exit status 1
Which shows the underlying issue with the query. Are you able to see the same behavior? I'm not sure I follow "and should be treated as warning only" since it seems to have some clear information about an unrecoverable failure with the underlying query that would be very useful in debugging.
I think critically for the example above, this commit is what allowed the underlying error to be surfaced https://github.com/googleapis/google-cloud-go/commit/c0dd2d130db6eef3982d62724f6d2847edf2be03
I have an additional question based upon @ajorgensen 's reproduce case and @rahul2393 's comment:
internally we just do the retry for transaction when it fails with that message
I'm not sure why retrying would be appropriate when this message is returned, as the error is masking over an actual, non-retryable problem. Why would retrying help?
log.Fatal(n, txErr)
cc: @ajorgensen is the reason for this, using Fatal will cause the program to exit, causing it to halt and not to do any retries client library is expected to do, Can you please try adding log.Error or print and see what you see.
I'm not sure why retrying would be appropriate when this message is returned, as the error is masking over an actual, non-retryable problem. Why would retrying help?
@bmhatfield there are various types of errors Spanner backend can return, instead of adding the logic on when to retry and not to retry for the failsafe we added to default retry if request is inlineBegin and first operation to Spanner fails(Unavailable/Aborted etc/HTTP gone..), request upon failure will be retried with explicit BeginTransaction.
@rahul2393 yes I can confirm the on v1.54.0 I am seeing the underlying error with the following test case. Thanks for your help here!
Your comments on the retry logic gave me a hint at what was going on. Consider the following example:
package main
import (
"context"
"fmt"
"log"
"cloud.google.com/go/spanner"
"google.golang.org/api/option"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/grpc/status"
)
func main() {
dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")
opts := []option.ClientOption{
option.WithEndpoint("localhost:9010"),
option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
option.WithoutAuthentication(),
}
client, err := spanner.NewClient(context.Background(), dbURI, opts...)
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
_, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
statement := spanner.Statement{
SQL: "insert foo into bar", // CONTAINS INVALID SQL, development error
Params: map[string]interface{}{},
}
_, txErr := transaction.Update(ctx, statement)
if txErr != nil {
return convertToStatusError(txErr)
}
return nil
})
if err != nil {
log.Fatal(convertToStatusError(err))
}
log.Printf("DONE")
}
func convertToStatusError(err error) error {
if err == nil {
return nil
}
return status.Error(spanner.ErrCode(err), spanner.ErrDesc(err))
}
2024/01/09 14:15:20 rpc error: code = Internal desc = rpc error: code = Internal desc = failed inline begin transaction
exit status 1
Specifically, this case is slightly modifying the error within the Transaction function by returning the error wrapped in a grpc error. When run against v.15.4 this will exhibit the same behavior we are seeing where the underlying error message is completely masked. I'm guessing any modification to the error returned by Update is causing some issue in the client where it cant be unwrapped correct, I'm not sure the exact mechanism here.
However, interestingly if we just return the error and then call convertToStatusError
outside of the transaction, we get the full details:
package main
import (
"context"
"fmt"
"log"
"cloud.google.com/go/spanner"
"google.golang.org/api/option"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/grpc/status"
)
func main() {
dbURI := fmt.Sprintf("projects/%s/instances/%s/databases/%s", "local-dev", "local-test", "db")
opts := []option.ClientOption{
option.WithEndpoint("localhost:9010"),
option.WithGRPCDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())), // Only used in a local development context
option.WithoutAuthentication(),
}
client, err := spanner.NewClient(context.Background(), dbURI, opts...)
if err != nil {
log.Fatal(err)
}
ctx := context.Background()
_, err = client.ReadWriteTransaction(ctx, func(ctx context.Context, transaction *spanner.ReadWriteTransaction) error {
statement := spanner.Statement{
SQL: "insert foo into bar", // CONTAINS INVALID SQL, development error
Params: map[string]interface{}{},
}
_, txErr := transaction.Update(ctx, statement)
if txErr != nil {
return txErr
}
return nil
})
if err != nil {
log.Fatal(convertToStatusError(err))
}
log.Printf("DONE")
}
func convertToStatusError(err error) error {
if err == nil {
return nil
}
return status.Error(spanner.ErrCode(err), spanner.ErrDesc(err))
}
2024/01/09 14:15:07 rpc error: code = InvalidArgument desc = Syntax error: Unexpected INSERT target name [at 1:12]
insert foo into bar
^
exit status 1
It is interesting to me that wrapping the error in this way either loses the details or makes it so they aren't carried forward but either way I think I can make a modification on our side to work around this.
Closing this since its working as intended.
Client
e.g. Spanner
Environment
e.g. Mac OSX + Cloud Spanner
Go Environment
$ go version: go version go1.20.4 darwin/arm6
Code
e.g.
Expected behavior
An error message describing that the SQL is malformed
Actual behavior
Errors are masked with
spanner: code = "Internal", desc = "failed inline begin transaction"
Additional context
e.g. Started after upgrading to cloud.google.com/go/spanner v1.44.0, v1.45.1 still broken.
This makes debugging mistakes very difficult. This used to provide excellent context about what was broken.
Manually logging the original error shows that it's simply masked: