googlegenomics / bigquery-examples

Advanced BigQuery examples on genomic data.
Apache License 2.0
89 stars 31 forks source link

optimize all queries using FLATTEN #13

Open deflaux opened 10 years ago

deflaux commented 10 years ago

Restructure them so that the clause within the FLATTEN returns only the columns needed and also applies any applicable WHERE clauses.

deflaux commented 10 years ago

For example compare:

SELECT
  contig,
  position,
  reference_bases,
  GROUP_CONCAT(alternate_bases) WITHIN RECORD AS alt,
  vt,
  END,
  super_population,
  IF(af >= 0.05,
    TRUE,
    FALSE) AS is_common_variant,
  IF(genotype.first_allele > 0
    OR genotype.second_allele > 0,
    1,
    0) AS has_variant
FROM
  FLATTEN([google.com:biggene:1000genomes.variants1kG],
    genotype) AS samples
JOIN
  [google.com:biggene:1000genomes.sample_info] p
ON
  samples.genotype.sample_id = p.sample
WHERE
  contig = '17'
  AND position BETWEEN 41196312
  AND 41277500

to a better version of the same thing - notice that the inner most query is now returning a lot less data to the outer query:

SELECT
  contig,
  position,
  reference_bases,
  alt,
  vt,
  END,
  super_population,
  is_common_variant,
  has_variant
FROM
  FLATTEN(
    (
    SELECT
      contig,
      position,
      reference_bases,
      GROUP_CONCAT(alternate_bases) WITHIN RECORD AS alt,
      vt,
      END,
      IF(af >= 0.05,
        TRUE,
        FALSE) AS is_common_variant,
      genotype.sample_id,
      IF(genotype.first_allele > 0
        OR genotype.second_allele > 0,
        1,
        0) AS has_variant
    FROM
      [google.com:biggene:1000genomes.variants1kG]
    WHERE
      contig = '17'
      AND position BETWEEN 41196312
      AND 41277500),
    genotype) AS samples
JOIN
  [google.com:biggene:1000genomes.sample_info] p
ON
  samples.genotype.sample_id = p.sample
fhoffa commented 10 years ago

IF(af >= 0.05, TRUE, FALSE) AS is_common_variant

same as

af >= 0.05 AS is_common_variant

?

deflaux commented 10 years ago

agreed, can be simplified

I'm pretty sure that pattern is in a few other places too, filed #14

please point out other things like this! Thanks Felipe