Given a BigQuery table t with a nested (struct) columnoffers.offer_id we can write checks normally using dotted field names like duplicate_count(offers.offer_id) = 0.
The check itself runs successfully but if there are duplicates the query to collect sample records fails.
The query (below) is generated by sql_get_duplicates().
The problem is that the frequencies CTE ends up with a column called offer_id rather than offers.offer_id
so the join fails. Editing the join condition to main.offers.offer_id = frequencies.offer_id fixes it.
Dealing with dotted field names in general seems difficult.
I'll work around the issue using a view with aliases the columns I'm interested in, but it would be cool if soda could do this automatically. For example alias the dotted column to a 'safe' version,
like SELECT offers.offer_id as offers_offer_id ...?
WITH
frequencies AS (
SELECT offers.offer_id
FROM t
WHERE offers.offer_id IS NOT NULL
GROUP BY offers.offer_id
HAVING count(*) > 1
)
SELECT main.*
FROM t main
JOIN frequencies ON main.offers.offer_id = frequencies.offers.offer_id
Given a BigQuery table
t
with a nested (struct) columnoffers.offer_id
we can write checks normally using dotted field names likeduplicate_count(offers.offer_id) = 0
.The check itself runs successfully but if there are duplicates the query to collect sample records fails. The query (below) is generated by sql_get_duplicates(). The problem is that the
frequencies
CTE ends up with a column calledoffer_id
rather thanoffers.offer_id
so the join fails. Editing the join condition tomain.offers.offer_id = frequencies.offer_id
fixes it.Dealing with dotted field names in general seems difficult.
I'll work around the issue using a view with aliases the columns I'm interested in, but it would be cool if soda could do this automatically. For example alias the dotted column to a 'safe' version, like
SELECT offers.offer_id as offers_offer_id ...
?