Closed maxbox51 closed 10 years ago
Actually, according to this query,
SELECT
contig,
position,
vt,
end,
COUNT(alt) as n
FROM
(SELECT
contig,
position,
vt,
end,
GROUP_CONCAT(alternate_bases) WITHIN RECORD AS alt
FROM
[google.com:biggene:1000genomes.variants1kG]
#[google.com:biggene:test.variants1kG_tiny]
)
GROUP EACH BY
contig, position, vt, end
having
n > 1;
Which returns no results, the alternative variants aren't necessary for an unique key, either. This makes sense: you only need one list of alternatives values relative to the single default (reference) value.
In https://github.com/googlegenomics/bigquery-examples/tree/master/1000genomes/data-stories/understanding-alternate-alleles,
The 1000 Genomes data never has more than a single value for reference_bases per <contig,position> pair, so reference_bases need not be in the unique key definition you give.