luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
305 stars 38 forks source link

Question: Calling non-variant sites #249

Closed ja-Reeve closed 2 years ago

ja-Reeve commented 2 years ago

How can I get Octopus to output a VCF including non-variant sites?

jelber2 commented 2 years ago

See https://luntergroup.github.io/octopus/docs/cli#--refcall

ja-Reeve commented 2 years ago

Thanks. I tried that before on a short 600kb region of the genome, but it only returns variant sites.

My call:

octopus \
                 -R $PATH/ref_genome.fasta \
                 -i $PATH/bam_list.txt \
                 -o $PATH/Octopus_trial1.6b.vcf.gz \
                 -T Contig38698:0-608273 \
                 --very-fast --refcall POSITIONAL

Output (only showing 1st 10 calls & 3 samples): Screen Shot 2022-11-04 at 16 51 41

jelber2 commented 2 years ago

Are you using the latest commit in the development branch?

ja-Reeve commented 2 years ago

Yes, version: 17a597d-20220708 is installed on the server I am using

jelber2 commented 2 years ago

What if you remove --very-fast ?

ja-Reeve commented 2 years ago

I tried a test run, but it timed out after 5days. I will give it another go with more time.

jelber2 commented 2 years ago

Maybe use samtools on a single sample to filter the bam file to just that region to test octopus without the --very-fast?

ja-Reeve commented 2 years ago

I tested this region with a single sample without `--very-fast``. I only returns indels and SNPs, no non-variant sites. Screen Shot 2022-11-04 at 19 17 56

jelber2 commented 2 years ago

Maybe I do not understand, but these are non-variant sites as well in the output you show. For example Contig38698:1 has phased genotype 0|0 . The remaining sites in the interval Contig38698:2-13763 are presumably not passing filter? So, you might need to keep the raw calls with --keep-unfiltered-calls and perhaps do some post processing to include the lines from the unfiltered file to the filtered file? I guess last time I used this, I realized I only wanted the confidently called sites, whether variant or non-variant.

ja-Reeve commented 2 years ago

Calling again with --keep-unfiltered-calls worked. You're right, many sites were being removed due to the filters. Thanks a lot for your help.

jelber2 commented 2 years ago

Great!