nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
128 stars 78 forks source link

DSL2: Genotyping on multiple snp sets in one run? #1077

Open TCLamnidis opened 1 week ago

TCLamnidis commented 1 week ago

It might be nice to be able to genotype on multiple SNP sets in a single run. I'm specifically thinking of pileupcaller here, not sure how it would apply to other genotypers, but:

Currently, the reference sheet takes one pileupcaller_{bed,snp} per reference. That means that if one wanted to genotype on two sets of positions, they would need to run the entire pipeline twice, or duplicate a row in the reference sheet just for that additional genotyping. Now, since the latter option will not fly with the ref-sheet validation, one would have to "fake" an entire new reference, thus duplicating all the processing, just for the extra genotypes.

Solution: Maybe we can turn the pipleupcaller_bed/snp columns into a list column, e.g. multiple files separated by ;, that would then get split into separate channel elements with the same meta, and thus only duplicate the genotyping step?

TCLamnidis commented 1 week ago

Something like:

x="potato.bed;banana.bed;tomato.bed"

y=Channel.of(x)
  .flatMap{
      x -> 
      def y=x.split(';')
      y
  }
  .view()

potato.bed banana.bed tomato.bed

These can then be separately input into genotyping and produce their own genotypes, or get catted to produce one superset?