Closed ja-Reeve closed 2 years ago
Thanks. I tried that before on a short 600kb region of the genome, but it only returns variant sites.
My call:
octopus \
-R $PATH/ref_genome.fasta \
-i $PATH/bam_list.txt \
-o $PATH/Octopus_trial1.6b.vcf.gz \
-T Contig38698:0-608273 \
--very-fast --refcall POSITIONAL
Output (only showing 1st 10 calls & 3 samples):
Are you using the latest commit in the development branch?
Yes, version: 17a597d-20220708 is installed on the server I am using
What if you remove --very-fast ?
I tried a test run, but it timed out after 5days. I will give it another go with more time.
Maybe use samtools on a single sample to filter the bam file to just that region to test octopus without the --very-fast?
I tested this region with a single sample without `--very-fast``. I only returns indels and SNPs, no non-variant sites.
Maybe I do not understand, but these are non-variant sites as well in the output you show. For example Contig38698:1 has phased genotype 0|0 . The remaining sites in the interval Contig38698:2-13763 are presumably not passing filter? So, you might need to keep the raw calls with --keep-unfiltered-calls
and perhaps do some post processing to include the lines from the unfiltered file to the filtered file? I guess last time I used this, I realized I only wanted the confidently called sites, whether variant or non-variant.
Calling again with --keep-unfiltered-calls
worked. You're right, many sites were being removed due to the filters.
Thanks a lot for your help.
Great!
How can I get Octopus to output a VCF including non-variant sites?