luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Question: can we genotype a germline sample at sites with known alleles? #60

Closed nh13 closed 3 years ago

nh13 commented 5 years ago

For example, I want to genotype all of dbSNP SNP sites, and then use downstream tools to impute missing genotypes (requiring posterior likelihoods).

  1. Is there a way to feed in the sites and alleles at which I want octopus to call?
  2. For imputing missing genotypes, typically we would use genotype likelihoods as input; would you recommend another input (ex. do you output posteriors in the GP INFO tag)?

In this case I may be calling a single individual or a population worth of germline samples.

dancooke commented 5 years ago

Octopus allows user supplied candidate variants (e.g. from dbSNP) with the --source-candidates and --source-candidates-file options. The former accepts one or more VCF/BCF format files, the latter a text file of paths to VCF/BCF files. This does not mean that these variants are regenotyped and will appear in the output; they are simply added to the list of candidate variants considered by the calling algorithm. There is an option --regenotype that is intended to do regenotyping, but this is currently not functional.

With regards to genotype likelihoods, Octopus computes genotype likelihoods for haplotypes, so it is not possible to report allele level genotype likelihoods. It should however be possible to report allele-resolution genotype genotype posteriors - I'll try to add a GP VCF field when I have time.