googleapis / google-cloud-go

Google Cloud Client Libraries for Go.
https://cloud.google.com/go/docs/reference
Apache License 2.0
3.71k stars 1.27k forks source link

BigQuery: 404 table not found even when the table exists #975

Closed zero-master closed 6 years ago

zero-master commented 6 years ago

Client

BigQuery Go Client

Describe Your Environment

Linux 4.15.11-1-ARCH SMP PREEMPT x86_64 GNU/Linux go version go1.10.1 linux/amd64

Expected Behavior

I am trying to stream into BigQuery.

It should just insert the data without 404 table not found error since the table already exists.

Actual Behavior

404 table not found

Before running go run the package was updated with go get -u cloud.google.com/go/bigquery

go run checkTable.go
2018/04/19 14:38:20 googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound
2018/04/19 14:38:20 Creating table too
panic: googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable14, notFound

goroutine 1 [running]:
main.main()
        /home/core/go/src/gcf_hello_world/checkTable.go:53 +0x991
exit status 2

I am able to consitently reproduce it with the code below:

  1. First make sure the dataset exists

  2. In the below code, change the tableID to a table which does not exist yet.

It will first insert without creating table, which is expected to fail.

Then it will create new table

Finally, it will insert again but this time also we'll end up with 404 table not found error.

Here is my project ID: pureapp-199410 (API request logs might help here)

In checkTable.go I have:

package main

import (
    "context"
    "log"

    "cloud.google.com/go/bigquery"
    "google.golang.org/api/option"
)

const (
    projectID = "pureapp-199410"
    seperator = "."
)

type Item struct {
    Name  string
    Count int
}

func main() {
    ctx := context.Background()
    client, err := bigquery.NewClient(ctx, projectID, option.WithCredentialsFile("creds.json"))
    if err != nil {
        panic(err)
    }
    datasetID := "dream4"
    tableID := "sometable13"
    items := []*Item{
        // Item implements the ValueSaver interface.
        {Name: "n1", Count: 7},
        {Name: "n2", Count: 2},
        {Name: "n3", Count: 1},
    }

    u := client.Dataset(datasetID).Table(tableID).Uploader()
    if err := u.Put(ctx, items); err != nil {
        log.Println(err)
    } else {
        log.Println("Insert successful")
    }

    log.Println("Creating table too")
    table := client.Dataset(datasetID).Table(tableID)
    ctx1 := context.Background()
    if err := table.Create(ctx1, &bigquery.TableMetadata{Schema: schema}); err != nil {
        panic(err)
    }

    u = client.Dataset(datasetID).Table(tableID).Uploader()
    ctx2 := context.Background()
    if err := u.Put(ctx2, items); err != nil {
        panic(err)
    } else {
        log.Println("Insert successful")
    }

}

var schema bigquery.Schema = []*bigquery.FieldSchema{
    {Name: "Name", Type: bigquery.StringFieldType},
    {Name: "Count", Type: bigquery.IntegerFieldType},
}
zero-master commented 6 years ago

There is a delay between creating a table and its availability.

Test 1

 go run run.go
2018/04/19 20:33:44 Insert retry attempt 1 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:44 Creating table only
2018/04/19 20:33:45 Table successfully created
2018/04/19 20:33:45 Waiting for 3ms before next attempt
2018/04/19 20:33:45 Insert retry attempt 2 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:45 Creating table only
2018/04/19 20:33:45 Table already exists
2018/04/19 20:33:45 Waiting for 11ms before next attempt
2018/04/19 20:33:46 Insert retry attempt 3 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:46 Creating table only
2018/04/19 20:33:46 Table already exists
2018/04/19 20:33:46 Waiting for 15ms before next attempt
2018/04/19 20:33:47 Insert retry attempt 4 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:47 Creating table only
2018/04/19 20:33:47 Table already exists
2018/04/19 20:33:47 Waiting for 25ms before next attempt
2018/04/19 20:33:48 Insert retry attempt 5 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:48 Creating table only
2018/04/19 20:33:48 Table already exists
2018/04/19 20:33:48 Waiting for 33ms before next attempt
2018/04/19 20:33:49 Insert retry attempt 6 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:49 Creating table only
2018/04/19 20:33:49 Table already exists
2018/04/19 20:33:49 Waiting for 72ms before next attempt
2018/04/19 20:33:50 Insert retry attempt 7 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:50 Creating table only
2018/04/19 20:33:50 Table already exists
2018/04/19 20:33:50 Waiting for 133ms before next attempt
2018/04/19 20:33:51 Insert retry attempt 8 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:51 Creating table only
2018/04/19 20:33:51 Table already exists
2018/04/19 20:33:51 Waiting for 256ms before next attempt
2018/04/19 20:33:52 Insert retry attempt 9 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:52 Creating table only
2018/04/19 20:33:53 Table already exists
2018/04/19 20:33:53 Waiting for 518ms before next attempt
2018/04/19 20:33:54 Insert retry attempt 10 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:54 Creating table only
2018/04/19 20:33:54 Table already exists
2018/04/19 20:33:54 Waiting for 1.024s before next attempt
2018/04/19 20:33:56 Insert retry attempt 11 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:56 Creating table only
2018/04/19 20:33:56 Table already exists
2018/04/19 20:33:56 Waiting for 2.052s before next attempt
2018/04/19 20:33:59 Insert retry attempt 12 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:33:59 Creating table only
2018/04/19 20:33:59 Table already exists
2018/04/19 20:33:59 Waiting for 4.097s before next attempt
2018/04/19 20:34:03 Insert retry attempt 13 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:34:03 Creating table only
2018/04/19 20:34:04 Table already exists
2018/04/19 20:34:04 Waiting for 8.194s before next attempt
2018/04/19 20:34:13 Insert retry attempt 14 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:34:13 Creating table only
2018/04/19 20:34:13 Table already exists
2018/04/19 20:34:13 Waiting for 16.393s before next attempt
2018/04/19 20:34:30 Insert retry attempt 15 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:34:30 Creating table only
2018/04/19 20:34:30 Table already exists
2018/04/19 20:34:30 Waiting for 32.776s before next attempt
2018/04/19 20:35:03 Insert retry attempt 16 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable32, notFound
2018/04/19 20:35:03 Creating table only
2018/04/19 20:35:04 Table already exists
2018/04/19 20:35:04 Waiting for 1m5.54s before next attempt
2018/04/19 20:36:10 Inserting data successful after 2m27.182891398s

Test2

 go run run.go
2018/04/19 20:37:16 Insert retry attempt 1 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:16 Creating table only
2018/04/19 20:37:17 Table successfully created
2018/04/19 20:37:17 Waiting for 3ms before next attempt
2018/04/19 20:37:17 Insert retry attempt 2 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:17 Creating table only
2018/04/19 20:37:17 Table already exists
2018/04/19 20:37:17 Waiting for 11ms before next attempt
2018/04/19 20:37:18 Insert retry attempt 3 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:18 Creating table only
2018/04/19 20:37:19 Table already exists
2018/04/19 20:37:19 Waiting for 15ms before next attempt
2018/04/19 20:37:19 Insert retry attempt 4 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:19 Creating table only
2018/04/19 20:37:20 Table already exists
2018/04/19 20:37:20 Waiting for 25ms before next attempt
2018/04/19 20:37:20 Insert retry attempt 5 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:20 Creating table only
2018/04/19 20:37:21 Table already exists
2018/04/19 20:37:21 Waiting for 33ms before next attempt
2018/04/19 20:37:21 Insert retry attempt 6 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:21 Creating table only
2018/04/19 20:37:21 Table already exists
2018/04/19 20:37:21 Waiting for 72ms before next attempt
2018/04/19 20:37:22 Insert retry attempt 7 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:22 Creating table only
2018/04/19 20:37:22 Table already exists
2018/04/19 20:37:22 Waiting for 133ms before next attempt
2018/04/19 20:37:23 Insert retry attempt 8 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:23 Creating table only
2018/04/19 20:37:23 Table already exists
2018/04/19 20:37:23 Waiting for 256ms before next attempt
2018/04/19 20:37:24 Insert retry attempt 9 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:24 Creating table only
2018/04/19 20:37:24 Table already exists
2018/04/19 20:37:24 Waiting for 518ms before next attempt
2018/04/19 20:37:25 Insert retry attempt 10 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:25 Creating table only
2018/04/19 20:37:26 Table already exists
2018/04/19 20:37:26 Waiting for 1.024s before next attempt
2018/04/19 20:37:27 Insert retry attempt 11 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:27 Creating table only
2018/04/19 20:37:27 Table already exists
2018/04/19 20:37:27 Waiting for 2.052s before next attempt
2018/04/19 20:37:30 Insert retry attempt 12 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:30 Creating table only
2018/04/19 20:37:30 Table already exists
2018/04/19 20:37:30 Waiting for 4.097s before next attempt
2018/04/19 20:37:35 Insert retry attempt 13 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:35 Creating table only
2018/04/19 20:37:35 Table already exists
2018/04/19 20:37:35 Waiting for 8.194s before next attempt
2018/04/19 20:37:44 Insert retry attempt 14 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:37:44 Creating table only
2018/04/19 20:37:45 Table already exists
2018/04/19 20:37:45 Waiting for 16.393s before next attempt
2018/04/19 20:38:01 Insert retry attempt 15 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:38:01 Creating table only
2018/04/19 20:38:02 Table already exists
2018/04/19 20:38:02 Waiting for 32.776s before next attempt
2018/04/19 20:38:35 Insert retry attempt 16 error googleapi: Error 404: Not found: Table pureapp-199410:dream4.sometable33, notFound
2018/04/19 20:38:35 Creating table only
2018/04/19 20:38:35 Table already exists
2018/04/19 20:38:35 Waiting for 1m5.54s before next attempt
2018/04/19 20:39:41 Inserting data successful after 2m26.84197699s
zero-master commented 6 years ago

image

image

jba commented 6 years ago

I think you answered your own question. Creating a table is an eventually consistent operation: it may not appear to exist everywhere until after some time has passed. In the screenshots above, the second request, which happens 2 seconds after the first, may end up at a different location where the creation hasn't yet propagated.

Have you considered Uploader.TableTemplateSuffix as a way to create the table automatically?

zero-master commented 6 years ago

I started with TableTemplateSuffix but I get the impression that it can't use a table from a dataset as a template for a new table in a different dataset. So, if I create a new dataset, there is no way to base my new table on anything and the only option I am left with is to create it from a schema. Please correct me if I am wrong here.

I am creating datasets and tables dynamically based on the fields of the event. If the insertAll request fails, I create either both dataset and table or just table based on the error response. (I parse the error to know what does not exist)

I am reading from Kinesis steams and plan to use Lambda for streaming into BQ.

Do you know what strategy Apache Beam uses to get around these limitations of dynamically creating datasets/tables and streaming data into them without waiting for 10 minutes and retrying?

shollyman commented 6 years ago

Yes, BigQuery's streaming inserts definitely exhibit eventually consistent behaviors when you reactively create a table in response to a streaming notFound response. This is covered in the streaming section of the BigQuery troubleshooting docs.

However, its not clear to me from the comments whether you're dealing with constantly evolving schemas, or if there's a known set of messages/schemas and you're just inspecting and redirecting them based on message structure.

If its the latter case, having the tables created beforehand should likely suffice, as streaming to a table that already exists should encounter none of the eventual consistency issues you're observing with your current approach.

However, if the schemas are truly dynamic, one of the things you might consider is leveraging BigQuery's ability to use complex types, particularly arrays of structs.

Consider the following schema:

    _ = bigquery.Schema{
        {
            Name: "event_time",
            Type: bigquery.TimestampFieldType,
        },
        {
            Name: "sender_id",
            Type: bigquery.StringFieldType,
        },
        {
            Name: "destination_id",
            Type: bigquery.StringFieldType,
        },
        {
            Name: "message_type",
            Type: bigquery.StringFieldType,
        },
        {
            Name:     "metrics",
            Type:     bigquery.RecordFieldType,
            Repeated: true,
            Schema: bigquery.Schema{
                {
                    Name: "key",
                    Type: bigquery.StringFieldType,
                },
                {
                    Name: "value",
                    Type: bigquery.StringFieldType,
                },
                {
                    Name: "annotation",
                    Type: bigquery.StringFieldType,
                },
            },
        },
    }

Within it, you have a set of known top level fields (event_time, sender, destination, etc), but "metrics" is an array of key/value structs, which can receive many unrelated values and does not require schema evolution. You can access/manipulate filter the arrays as needed using Standard SQL, which has its own topic in the query documentation.

jba commented 6 years ago

Closing, but please reopen if you have more questions or concerns.

ppanyukov commented 4 years ago

@jba Main concern is that nowhere in documentation it mentions anything like "table will eventually be there" and no recommended solutions offered either. Everyone has to figure this out on their own which is wasteful.

I spent a few hours yesterday thinking I was going mad. It would be nice if user experience was better in this regard.

iamolegga commented 4 years ago

Hello, I've got the same issue. Due to some reasons I can't use TableTemplateSuffix and I need to create table dynamically and stream data to it. I have tried to request TableMetadata and it response successfully but Inserter still returns 404 error right after table.Metadata() success.

There would be really useful some kind of Available() method on Table

@jba

iamolegga commented 3 years ago

Facing this issue again, @jba can we reopen this issue again and maybe you can consider any technique that allows to understand if table and dataset are created succesfully and can be used without getting 404, getting metadata of that objects doesn't help, if it returned successfully I still getting 404 when inserting or reading data

iamolegga commented 3 years ago

@shollyman sorry for bothering, but maybe you can help with this?

RamonXavier commented 2 years ago

Guys, how are you?

This problems is factible for me too.

My solution is addition a delay time for insert in new table.

my code is in .net5 but the logic is aplicated for your problem.

        if (CheckTableExists(client, datasetId, tableId))
             await client.DeleteTableAsync(datasetId, "your_table_id");`

        await client.GetOrCreateTableAsync(datasetId, tableId: "your_table_id", schema: schema.Build());

        await Task.Delay(150000); // this is the magic line
        await TableInsertRows(client, datasetId, tableId, faturas);

        private static bool CheckTableExists(BigQueryClient client, string dataset, string table){

             var sql = $"SELECT size_bytes FROM {dataset}.__TABLES__ WHERE table_id='{table}'";
             var result = client.ExecuteQuery(sql, null);

             return result.TotalRows > 0;
         }
evil-shrike commented 2 years ago

Same issue for me, but with a different client (nodejs). I believe it's not specific to the go platform, it's a server-side issue. In my case I explicitly create a table and then insert into it. And sometimes the inserting fails. Client/API definitely should have a method to make sure a table has been created and safe to be used.

fzn0x commented 2 years ago

Same issue for me, but with a different client (nodejs). I believe it's not specific to the go platform, it's a server-side issue. In my case I explicitly create a table and then insert into it. And sometimes the inserting fails. Client/API definitely should have a method to make sure a table has been created and safe to be used.

Yup it's available everywhere, including the nodejs client,

fzn0x commented 2 years ago

For nodejs you can re run the script with child_process after setTimeout for 2 minutes.

flaviogragnolati commented 2 years ago

Hey, any update on this? or a more "consistent" workaround? I'm using the nodejs client and I'm experiencing the same issue. I've tried adding up to 5 minutes of timeout between table creation and data insert, but it's still very unreliable.

MajidJafari commented 2 years ago

Whenever I create the dataset and remove it due to some errors and then create it and add some table to it, I get this error.

But when I just create a new dataset with a different name, everything works without a problem.

s2t2 commented 2 years ago

experiencing this issue in python after creating a new table. client.get_table() says the table doesn't exist. but it does. please advise.

juandes commented 2 years ago

experiencing this issue in python after creating a new table. client.get_table() says the table doesn't exist. but it does. please advise.

I had to add a small delay, e.g., time.sleep(30) after creating the table.

erik-induro commented 1 year ago

When I call .exists() after .createTable() it returns true but the .insert() throws a 404. It seems unintuitive that exists != available. Similarly, awaiting .createTable() would seem to imply availability.

If an existence check passes then 404 with the message "Table X not found" is not the right message to return from insert since it clearly does exist. This kind of error for something I want to exist implies that the action I should take to fix it is to create it but in this case I shouldn't because it already exists. Instead I should be told to wait. I get "eventually consistent" but I'm going to need the API to be consistent on its definition of existence.

I'm disappointed that the best guidance appears to be gambling on a sufficiently large sleep. I'm probably going to put a retry on the insert with an exponential wait. It would be nice if this kind of logic was built into the SDK or if there was an availability check since existence appears to not provide that assurance.

shollyman commented 1 year ago

There's effectively nothing we can do here, as this is a classic eventual consistency issue. We document the behavior, but I'll try to explain it in a bit more detail the "why".

When you intermingle operations that change table metadata and stream data into a table, you're likely to observe the effects of this eventually consistent behavior. The streaming system, by nature of its vastly different scale, caches table metadata aggressively, in a combination of shared and individual caches.

Generally, the pattern that causes users the most problems is a stream->create table->stream pattern. The manifestation often looks something like the following:

It's the first streaming call that triggers the problem here. The call requires the streaming system to load the metadata state, and at that moment the table doesn't exist. This negative existence is cached by the streaming system for a time even if the table is created immediately afterwards.

Subsequent streaming calls leverage their cached metadata, and thus reject inserts for a time until such time as the cached table metadata expires and gets refreshed. Callers receive inconsistent responses because each streaming backend instance may have a slightly different cache state.

Generally, the best thing to do is to design your interactions so that you don't change table metadata while interacting with the streaming system. In the previous example, ensuring the table is created before the first streaming call is generally sufficient.

There are other interaction patterns, like deleting and recreating a table with the same name that will trigger similarly observed behaviors. In this case, rather than caching a negative existence, what users will observe is that not all writes appear to arrive to the table. In this case, it's because some writes may have been send to the old (now deleted) table, and some to the new. Similarly, schema evolution, where the schema of a table is extended, may take some time before all the streaming backends see the updated changes to a given table's schema.

Hopefully this provides some additional background into the nature of the issue.