malariagen / ag1000g-phase3-data-paper

Other
1 stars 2 forks source link

Population definitions suggestions #55

Open alimanfoo opened 3 years ago

alimanfoo commented 3 years ago

Not something that needs any discussion now, but might be worth talking at some point about how the population definitions are stored. Currently they're stored as either a YAML or CSV file that maps population IDs like "ANG_1_coluzzii_2009" to sets of sample IDs. Some thoughts:

E.g., population_definitions.yml could be something like:

BF_bana_2012_coluzzii:
  label: Burkina Faso, Bana, 2012, An. coluzzii
  samples:
    - sample_set: AG1000G-BF-A
      query: location = "Bana" and species_aim = "coluzzii"
BF_pala_2012_coluzzii:
  label: Burkina Faso, Pala, 2012, An. coluzzii
  samples:
    - sample_set: AG1000G-BF-A
      query: location = "Pala" and species_aim = "coluzzii"
# etc.