projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
263 stars 110 forks source link

Notebook continuous integration 01/24/22: simplify genotype simulation notebooks #477

Closed williambrandler closed 2 years ago

williambrandler commented 2 years ago

Signed-off-by: William Brandler william.brandler@databricks.com

What changes are proposed in this pull request?

Split the genotype simulation notebook up into three:

  1. download 1000G data (before just two chromosomes, now all autosomes)
  2. define functions for hardy weinberg etc to simulate genotypes
  3. perform data simulation on downloaded VCFs

This avoids having to always download 1000G data, which is the bottleneck for testing the code. Functions are split out into their own notebook, eventually this notebook can be deleted when the simulation is done by a library such as sim1000G

Added a tip that phenotypes and covariates should be sorted in same order as genotypes

How is this patch tested?

(Details)

codecov[bot] commented 2 years ago

Codecov Report

Merging #477 (4846dc0) into master (f9eda3e) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #477   +/-   ##
=======================================
  Coverage   93.66%   93.66%           
=======================================
  Files          95       95           
  Lines        4875     4875           
  Branches      457      457           
=======================================
  Hits         4566     4566           
  Misses        309      309           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7dc8a8e...4846dc0. Read the comment docs.