Duplicate row samples fails with dotted field name for BQ nested column (struct)

Given a BigQuery table t with a nested (struct) column offers.offer_id we can write checks normally using dotted field names like duplicate_count(offers.offer_id) = 0.

The check itself runs successfully but if there are duplicates the query to collect sample records fails. The query (below) is generated by sql_get_duplicates(). The problem is that the frequencies CTE ends up with a column called offer_id rather than offers.offer_id so the join fails. Editing the join condition to main.offers.offer_id = frequencies.offer_id fixes it.

Dealing with dotted field names in general seems difficult.
I'll work around the issue using a view with aliases the columns I'm interested in, but it would be cool if soda could do this automatically. For example alias the dotted column to a 'safe' version, like SELECT offers.offer_id as offers_offer_id ...?


WITH 
frequencies AS (
    SELECT offers.offer_id
    FROM t
    WHERE offers.offer_id IS NOT NULL
    GROUP BY offers.offer_id
    HAVING count(*) > 1
)
SELECT main.*
FROM t main
JOIN frequencies ON main.offers.offer_id = frequencies.offers.offer_id

sodadata / soda-core

Duplicate row samples fails with dotted field name for BQ nested column (struct) #1910