runt18 / google-bigquery

Automatically exported from code.google.com/p/google-bigquery
0 stars 0 forks source link

BigQuery mistakenly flattens on nested field when it's not referenced #498

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This query:

SELECT 
    doctor.dr_id AS doctor_dr_id,
FROM (SELECT * FROM rsdw.doctor) doctor 
WHERE
  doctor.is_published
  AND doctor.specialties.specialty = 'Plastic Surgeon'
LIMIT 500

produces the error:

Query Failed
Error: Cannot query the cross product of repeated fields 
cover_photos.is_published and specialties.specialty.
Job ID: realself-main:bquijob_52cb1db8_153f481eb42

This doesn't make sense because I'm not referring to cover_photos.is_published

specialties.specialty is a repeated field and so is cover_photos.is_published 
-- but is_published is also the name of an  outer field. I think BigQuery is 
mistakenly flattening on the cover_photos.is_published field when it's actually 
the outer is_published field which is being referenced.

Original issue reported on code.google.com by a...@realself.com on 8 Apr 2016 at 6:18

GoogleCodeExporter commented 8 years ago
Hi, you've stumbled into an oddity of the existing BigQuery SQL dialect. Try:

SELECT 
    doctor.dr_id AS doctor_dr_id,
FROM [rsdw.doctor] doctor 
WHERE
  doctor.is_published
  AND doctor.specialties.specialty = 'Plastic Surgeon'
LIMIT 500

The sub-select `(SELECT * FROM rsdw.doctor)` is evaluated in a way that makes 
the other repeated fields visible at top-scope, even if not included in the 
query results. You can either remove the SELECT * sub-select as in my example, 
or explicitly sub-select the fields you desire from the underlying table.

Also note that we're working on an improved SQL dialect that will better handle 
"pushing down" the outer-selected fields into the inner sub-select, see issue 
448. I'll close this as "WontFix", but we'll improve the behavior soon.

Original comment by wes...@google.com on 8 Apr 2016 at 3:42