iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Release 0.6 criteria #128

Closed iqbal-lab closed 5 years ago

iqbal-lab commented 6 years ago

I think we just need one thing for the following release:

The right wrappers (nextflow?) to allow simple automation/running of gramtools on a set of N samples, including Cortex for novel discovery, and using Martin's existing infrastructure. This needs two things

  1. (easy I expect) Nextflow script to combine it
  2. A script to project a VCF with respect to an inferred path/reference, back to a VCF with respect to the original reference (so we can combine VCFs)

If we have those, then I think we have everything No changes to gramtools core, I suggest

ffranr commented 6 years ago

We should replace this issue with a project. This will let us break it up into separate connected issues. What are the sub-issues in this case?

iqbal-lab commented 6 years ago

Sounds good. I think the two issues are itemized above. First (nextflow) is v simple - chaining together of commands - could almost be a perl script but nextflow allows paralleisation . The second is tricky. I agree with your suggestion

ffranr commented 6 years ago

Cortex for novel discovery

Are you proposing cortex as a means of diploid inference?


A nextflow script just describes a pipeline. What will the proposed pipeline do exactly? Something like this perhaps:

  1. gramtools build
  2. gramtools quasimap
  3. gramtools infer
  4. gramtools discover

A script to project a VCF with respect to an inferred path/reference, back to a VCF with respect to the original reference (so we can combine VCFs)

Is this part of gramtools discover?


Please remind me of the following: what is the goal of the piple? What does it produce?

iqbal-lab commented 6 years ago

I'm proposing this release is about haploid, and using cortex for discovery using a reference that is the output of gramtools infer.

Pipeline goal is Get a vcf for each sample via gramtools infer and then cortex Convert those vcfs back into vcfs with respect to the standard ref Combine using minos, genotype all samples, output a single vcf.

iqbal-lab commented 6 years ago

This might be easier on chat

bricoletc commented 5 years ago

I'm closing this as we have shown the build/quasimap/infer/discover chain works on Plasmodium falciparum crosses. Vcf rebasing in discover, meaning describing newly discovered variation against personalised reference genome from infer in terms of original reference genome from build, works.

There is a snakemake pipeline for running this chain, and also running build/quasimap/infer again for cohort analysis of several samples. I will make a Nextflow one eventually too!