poseidon-framework / poseidon-analysis-hs

A tool to analyse genotype data (optionally in the poseidon data format)
MIT License
3 stars 0 forks source link

Better config file #2

Closed stschiff closed 2 years ago

stschiff commented 2 years ago

Currently, stat files for fstats look like:

F4(<Chimp.REF>, <Altai_published.DG>, Yoruba, French)
F4(<Chimp.REF>, <Altai_snpAD.DG>, Spanish, French)
F4(Mbuti,Nganasan,Saami.DG,Finnish)

I am planning to add a new YAML format config file, while keeping the option of giving these statFile inputs for backwards compatibility. The new input format seeks to improve on two fronts:

  1. It would allow for adhoc group definitions, similarly to ras
  2. It would allow defining bulk statistics defined via population lists.

Here is an example:

groupDefs:
  group1: a,b,-c,-<d>
  group2: e,f,-<g>
stats:
- type: f4
  popA: [a, b, c, group1]
  popB: [<i1>, <i2>, group2]
  popC: [d]
  popD: [Chimp.REF]
- type: f3
  popA: [a, b]
  popB: [c]
  popC: [d]
- type: FST
  popA: [a]
  popB: [b]

which would then create 12 f4 statistics, 2 f3 statistics and one FST statistics, which would be simply listed out in the output.

I think this gives quite some flexibility while power to those who want it. What do you think, @TCLamnidis?

TCLamnidis commented 2 years ago

This is exactly the sort of thing I had in mind! This would be a great quality of life feature.

stschiff commented 2 years ago

Even better, including ascertainment:

groupDefs:
  group1: a,b,-c,-<d>
  group2: e,f,-<g>
  group3: [group1, group2]
  Right1: ["<Ind1>", "<Ind2>"]
  Right2: ["Group1", "Group2"]
  AllRights: [Right1, Right2]
stats:
- type: f4 // this would yield 15 statistics
  ascertainment:
    based_on: group3
    min_af: 0
    max_af: 0.02
    outgroup: Chimp.REF
  popA: [a, b, c, group1, <ind1>]
  popB: [<i1>, <i2>, group2]
  popC: [d]
  popD: [Chimp.REF]
- type: f3
  ascertainment:
    based_on: All_Rights
    min_af: 0
    max_af: 0.02
    outgroup: Chimp.REF

  popA: [Left1, Left2, Left3,...]
  popB: [Right1, Right2, Right3,...]
  popC: [d]
- type: FST
  popA: [a]
  popB: [b]
stschiff commented 2 years ago

Done.