iqbal-lab-org / gramtools

Genome inference from a population reference graph
MIT License
92 stars 15 forks source link

Discover command #134

Closed ffranr closed 6 years ago

ffranr commented 6 years ago

This command finds new variation with respect to an inferred personal reference.

Inputs

These are the inputs to this command:

Procedure

The following is an outline of how this command works:

  1. Construct a FASTA equivalent of vcf_infer called reference_infer. This is accomplished by selecting the alternative allele for each entry within vcf_infer.
  2. reference_infer and reads are passed to cortex which generates a VCF. This VCF will be referred to as vcf_cortex.
  3. A new VCF file is generated by transposing vcf_cortex such that the reference is reference_build instead of reference_infer. This VCF file will be referred to as vcf_discover. Using vcf_infer instead of reference_infer is necessary at this step.

Output

Discover produces a single VCF file as an output: vcf_discover. This VCF file uses reference_build as a reference.

Notes

Each entry within vcf_cortex can contain at most one alternative. Furthermore, these entries do not overlap. This should also hold true for vcf_discover.

iqbal-lab commented 6 years ago

This command finds new variation with respect to an inferred personal reference.

Inputs These are the inputs to this command:

Procedure The following is an outline of how this command works:

The inferred personal reference and reads are passed to cortex which generates a VCF. This VCF will be referred to as vcf_cortex. vcf_cortex is then transformed/projected such that the reference is the same as that used by vcf_build (O-ref).

ffranr commented 6 years ago

The infer command can generate either a VCF vcf_infer or FASTA reference_infer output. The discover command should take vcf_infer as an input and not reference_infer. This will make it easier to "transpose" the cortex output VCF vcf_cortex on to the reference found in the gram directory.

Will update top comment.

Example command currently looks like this:

gramtools discover --gram-directory ./gram --inferred-vcf ./infer.vcf --reads ./reads --output-vcf ./discover.vcf
iqbal-lab commented 6 years ago

Cortex (or other caller) will need the reference_infer also.

ffranr commented 6 years ago

implemented here: https://github.com/iqbal-lab-org/gramtools/commit/a9fa87815f052ae0ec808149323d4d3e50748e45