runt18 / google-bigquery

Automatically exported from code.google.com/p/google-bigquery
0 stars 0 forks source link

Unexpected schema mismatch when appending results to existing table #326

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I'm trying to migrate from my current fragmented table setup to daily tables, 
so I have some pretty large queries selecting from a lot of tables. I ran 100+ 
jobs and not a single one completed. I tried copying the query into the 
BigQuery UI and selecting into a new table (Allow Large Results = true, 
FlattenResults = false) and it worked just fine.

This is the error I saw on every job failure, followed by a list of failed 
jobs. Can I get any more guidance about why my queries are all failing?

"Errors encountered during job execution. An internal error occurred and the 
request could not be completed."

job_2MK4KJl5BHW-_TaCCYGA9b7Gdd8
job_5mJDKjVRfjfIN9f4q4TBJ7zd6is
job_z0larrEGeI-KrYMlGp9-JxSWKiY
job_MUxCW_GhESodAj0kuasCf_V0xzw
job_lpsRX2eQCaDq6216cY1w13P0zcE
job_paJ-aKKgT5qPClx5d48znj9AcNY
job_4YzJYNPAMblhzm34URhsDgrIQ7k
job_MOcbCvrUaVEhYfEaGdeptzntywQ
job_hcYkQttVDFHU82jO4KSnYwQZoBg
job_dfHkpoEcFfANlCyo4-HX5wG75fg
job_NF2myDsMtSGxz4u-ISC5OYB1iUA
job_h2R3gC_MgLn_7dn33L31YOjFvVw
job_83OCeBB3qc8UgiLcP0viA6fhcYM
job_Cr-bwW8F--YwRmPQjM29fLTAnyY
job_LdkYtgEh1nAHARARMmPWqd0ieII
job_UkuLJ8NexQ0-C6sbAlB7LWKKa3o
job_PCW_J-S5GOfAMa4tX33sg9N_tGM
job_arNXt4tFI-atCwQMS8caoBKjhbs
job_srDkFZgGYKTOWVZB7LmLj5EzfRA
job_PP82eV78Z7yeuiZuVkISlaWZYdE
job__QOdCLUQFUCe5cSABB0e_LmNmOk
job_OeqHoFSbPvtqAs9jSCxXXec_ZcQ
job_wvJHHI3QHrSzmmiBTkExUe2zECk
job_dcfRbbj2WZAchTgV-6K5GG_HdI8
job_2sJ4H-z7N-rmVxVYcuxvmpdVq0Q
job_Mjy3opdJjoDLAXM7AVFRr1F4Vwk
job_XogUp1RvT8KSZsWpugWjlJs921k
job_h8EVsMy2Rdo7Y6Hp5KCU8aERWv8
job_LaQrI58YN25wfSlP1wZL6-mSRtY
job_s32wxeDbIJNLkY04jztbkxJOdnE
job_JtEyrMc4dORnvTiRvlQbVhdAnj8
job_2v-Jlx_sxvp0TZlSufijiqrVu44
job_qhPJt9jsmtjqjO8qd3A21JJ0-qU
job_NxZAh_aklHcuq82D1QO1UAqoq-o
job_ql6DnvdHOE95qm9q2iaJvXJtY18
job_FaYP84oeC_z0Nk-OZ4EfY2RoB54
job_khsqvgNnyxhJvefFpfaRHpzS-j4
job_SLr7nHdV23GQUth4nvh0mR28vkw
job_90gUm4nXlzPqLPjtx2oLLOcNHLw

job_zPI8jU7b45HxMisvb_0RikQ8qpk
job_ESvOVel7cDuKs6M2UFHJwLGOwOo
job_il8X2vGS6I-xLFWYfnYRfACr-qw
job_ncrZlRX2Cf6ZTpkfs8M9rmz3h4c
job_pUmrzDb3CcTRKNvfjb-M-fMZDi0
job_x8iQogt2Wwb4sTv3jJ8ABm6BCbk
job_qab8nElBBiNETRDYLTY8iLx0T9s
job_QzyLXK7oGXHu2j-w793EK8DP3JE
job_zkom5RpfhCms5zSR_T7Q8YZm3mo
job_3DrSDsLeOz1WHm-9siHt1odlZeI
job_5fzNt_TkD9v0USCRHM9d83aCMjI
job_7T3ak7OCkbLHg2m2hRHZvtLe7AE
job_Poi8Bd3gY-iWKi4F9mG7xMAJ5Ss
job_RFdGu9aYcHZiNKnQ2l6cmlzPH-U
job_Im6Q81ZvH1i69jCXpuSM6ZBB80w
job_ZFMv_VC_u3xxuNfYdGk4WEjpETY
job_1wN0k85uTIATBmErnrox_HImppo
job_A9BXz68tYt0ZJ75PZSOZp4qRqIk
job_nJUMeyTslo1Y1uDqrf9AYdgQFcQ
job_tKbatERD4fPRWOeg3Diy0Xq1i8g
job_s1AEojbxq3Wqln8e0uOOlcxQm-c
job_tekY4i2ERcTj5ydkVuTwHpCNIII
job_dHWzgq_0MgSWBStZSTjkNsGDdDg
job_GqkQQpjq5On_JZajVTw3DBdA38A
job_O6B_XgTDaX52ODwdwskdWJXIr6g
job_YCF69FW-8fKPJhHWSsVxqAZE15Q
job_hOOekk17VAwLsCZ3kT_NH-hRGqM
job_hWY8QcLXLBMDue9uURCFTJ5GQBE
job_u8uHZo-q99artUxZEN9YrDb9H8I
job_UGYA-cwsM-lkRNBfz7WhXlmMYlU
job_8D3BYktpbZZAUiQDMGbl1-SKE9s
job_YaoH_IO6gvJ_KlatxYPx_5KBqkQ
job_2oXywXANIhsKbyzyIzKSFAT0fGc
job_61Foa5Dhbm0Kc8rzX4_CPOLCDdk
job_afHWIZW6KBxDgJcInuAl-Xlojn4
job_xWUAUD3TZ-ZPscp5UT_400qiYz4
job__wmj2kS3mYV0xSYffik-_PLX0eE
job_VURayazlfdTVx9u6sEDuLJeJxmc
job_Yna_oHKGIejloUHGaPeY513jU-w

Original issue reported on code.google.com by de...@derekperkins.com on 16 Sep 2015 at 3:20

GoogleCodeExporter commented 8 years ago
After further investigation, I was able to get the exact query from one of 
these failed jobs to work in the BigQuery UI if I overwrote the table, but got 
the same "Internal error" if I tried to append to an existing table...

Original comment by de...@derekperkins.com on 16 Sep 2015 at 4:42

GoogleCodeExporter commented 8 years ago
Apologies, it looks like our oncall missed this report.  I'd recommend using 
Stack Overflow for support in the future; please just tag your question with 
"google-bigquery".  Stack Overflow is monitored much more regularly by BigQuery 
staff.

I looked up a number of the job ids from your initial report and did not find 
any of them in our system; this implies that the jobs were not successfully 
committed.

Can you describe the method you were using to launch these jobs and get the job 
ids?

Original comment by thomasp...@google.com on 24 Nov 2015 at 6:25

GoogleCodeExporter commented 8 years ago
(Note, though, that this issue tracker is better than StackOverflow for bugs or 
service issues, since StackOverflow is more intended for "how-to" programming 
questions. Both are monitored by Google engineers.)

Original comment by jcon...@google.com on 1 Dec 2015 at 4:48

GoogleCodeExporter commented 8 years ago
Quick update - we changed the way some of our job metadata was stored; I have 
found your job.  They were successfully committed.

Your queries were triggering an internal error involving query schema 
generation in our system.  I'll create a query to repro since we've fixed a 
couple of similar schema issues recently, and get back to you with an update.

Original comment by thomasp...@google.com on 30 Dec 2015 at 7:18

GoogleCodeExporter commented 8 years ago
I think we've fixed how the internal errors for these queries are mapped.

For example, a trimmed version of the query from job 
nozzle-app:job_NF2myDsMtSGxz4u-ISC5OYB1iUA now returns an actionable error 
message "Error: Cannot output multiple independently repeated fields at the 
same time. Found Results_Meta_Sitelinks_URL_DomainID and 
Results_Meta_ExpandedSitelinks_Rank".

Resolving issue, please re-open if you have any other issues with this.

Original comment by thomasp...@google.com on 30 Dec 2015 at 8:41

GoogleCodeExporter commented 8 years ago
So the error that you fixed is not the error that caused them to fail. IIRC, 
they were all queries being selected into new tables with flattenResults = 
false. My problem was that the schema I was selecting into didn't match the 
'required' attributes of the original. After running the query with the GROUP 
BY, and not specifying a schema, I ran a diff between my expected schema and 
the actual output schema and found that seemingly random fields were 
marked/unmarked as required.

Original comment by de...@derekperkins.com on 30 Dec 2015 at 9:29

GoogleCodeExporter commented 8 years ago
Thanks for the clarification!  I tried the query again with a destination table 
set, and without "flatten results", and was able to generate a result 
successfully.

We did fix a few issues with schema generation in November, so I wouldn't be 
surprised if this issue was addressed by one of our bugfixes.  That said, I 
only unioned 4 tables and used about a dozen keyword + search id filters, so 
the result set was very small, so please do let me know if you run into the 
same issue when retrying one of the full queries.

Original comment by thomasp...@google.com on 30 Dec 2015 at 10:03

GoogleCodeExporter commented 8 years ago
Was there a schema associated with those job IDs? If so, did your test work 
using that schema? The jobs ended up working in Sept after I let the query 
decide its own schema. If it works with the schema, that's a way to verify if 
the changes you made in Nov fixed the original issue.

Original comment by de...@derekperkins.com on 30 Dec 2015 at 10:21

GoogleCodeExporter commented 8 years ago
I don't have write access to your tables, so my query was writing to a new 
table.  I can verify that there are a few fields that are marked REQUIRED in 
the generated schema.

How was the schema of your original destination table generated ?  Did you 
manually create it with all of the fields set to NULLABLE (or REPEATED) ?  I 
can create a similar schema if so and retry the load.

Original comment by thomasp...@google.com on 30 Dec 2015 at 10:40

GoogleCodeExporter commented 8 years ago
I verified that if I create a table with all NULLABLE / REPEATED fields and run 
the query with "allow large results" + "noflatten" + "append", then I get an 
unhelpfully opaque error.

So, I'll address two things here:

1. Plumb the internal error message about schema mismatch through to the client
2. See if we can allow writing REQUIRED source fields to NULLABLE destination 
columns

I think we're on the same page now?

Original comment by thomasp...@google.com on 30 Dec 2015 at 10:54

GoogleCodeExporter commented 8 years ago
Yes, I think we're on the same page. I could probably dig up the exact schema 
from my git history if need be. IIRC, there were some required fields that 
turned into nullable fields that threw an error, and some previously nullable 
fields that were marked as required. My expectation is that it would read from 
and match the source schema.

Original comment by de...@derekperkins.com on 31 Dec 2015 at 4:01

GoogleCodeExporter commented 8 years ago
I've filed internal tracking bugs for these two issues, we will work on 
resolving them.

Original comment by thomasp...@google.com on 4 Jan 2016 at 10:17

GoogleCodeExporter commented 8 years ago
Hi Derek,

I've made a change to address #1 - we will return an error message that says 

"Output schema specified in the query request is not compatible with the query 
results: Field 'Keyword' is incompatible with the table schema: Label mismatch: 
actual 'LABEL_REQUIRED' vs. expected 'LABEL_OPTIONAL'"

for this query; that will at least help to shed light on where the problem is.  
That change will go out with our next deployment.

Re: #2 - one of my colleagues dug into this a bit deeper : the problem is that 
the source tables [rankings.*] have several RECORD columns listed as REQUIRED 
(Keyword, SearchDetails, SearchDetails.Summary, etc).  We do have the ability 
to write REQUIRED source *leaf* fields to NULLABLE destinations, but due to 
some implementation details, we can't currently relax a REQUIRED source 
*record* into a NULLABLE destination.  If you were to change the RECORDs in the 
source table from REQUIRED to NULLABLE, my colleague says it should work.

We do realize that this operation (writing a REQUIRED record to a NULLABLE 
destination) makes sense semantically, so we will investigate other solutions 
in the future.

Original comment by thomasp...@google.com on 7 Jan 2016 at 10:32