luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

DP very low compared to other callers #183

Closed ptranvan closed 3 years ago

ptranvan commented 3 years ago

Hi, thanks for your software.

I have multiples samples and I did a population calling. Example of commands for 2 samples:

octopus --threads 1 -R ref.fasta -I sample1.sorted.bam -o sample1.bcf

octopus --threads 1 -R ref.fasta -I sample2.sorted.bam -o sample2.bcf

octopus -R ref.fasta -I sample1.sorted.bam sample2.sorted.bam --disable-denovo-variant-discovery -c sample1.bcf sample2.bcf -o octopus.joint.bcf

The issue is that all the DP for each samples are very low (between 0 to 2) which is not the case with freebayes (don't pay attention to the genotype and number of samples -> this is not the same order and these are subsample to illustrate)

Example Otopus result:

scf3 82264   .       G       A       1785.37 PASS    AC=31;AN=158;DP=75;MQ=60;NS=74  GT:GQ:DP:MQ:PS:PQ:FT    1|0:15:1:60:82264:100:PASS   0
|0:15:1:60:82264:100:PASS       1|0:7:0:0:82264:100:AD,AFB,DP,ADP       0|0:15:1:60:82264:100:PASS      1|0:15:1:60:82264:100:PASS

Example Freebayes result

scf3 82264 . G A 1492.88 . AB=0.486111;ABP=3.37221;AC=34;AF=0.22973;AN=148;AO=111;CIGAR=1X;DP=526;DPB=526;DPRA=1.07105;EPP=244.044;EPPR=904.171;GTI=1;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=74;NUMALT=1;ODDS=0.356749;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=3983;QR=15177;RO=415;RPL=0;RPP=244.044;RPPR=904.171;RPR=111;RUN=1;SAF=0;SAP=244.044;SAR=111;SRF=0;SRP=904.171;SRR=415;TYPE=snp GT:DP:AD:RO:QR:AO:QA:GL 0/0:17:17,0:17:601:0:0:0,-5.11751,-54.416 0/1:10:6,4:6:222:4:148:-10.6717,0,-17.3278 0/0:15:15,0:15:545:0:0:0,-4.51545,-49.3856

Am I doing something wrong ?

Thanks fro your help.

dancooke commented 3 years ago

Could be that a lot of reads are getting filtered - DP is the filtered read depth used for calling. You can check the fraction of filtered reads per variant site with the FRF annotation (add --annotations FRF to your command). You can also check the number of unfiltered reads assigned to a haplotype with the ADP annotation, or get per-allele depths with AD.

ptranvan commented 3 years ago

@dancooke thanks for your fast answer.

1) What are the criteria you used to filter reads ?

2) Can I combined the annotations in one command ? eg. --annotations FRF ADP AD

3) Do I have to add this parameter from the start and after the population calling ? ex:

octopus --threads 1 -R ref.fasta -I sample1.sorted.bam -o sample1.bcf --annotations FRF ADP AD

octopus --threads 1 -R ref.fasta -I sample2.sorted.bam -o sample2.bcf --annotations FRF ADP AD

octopus -R ref.fasta -I sample1.sorted.bam sample2.sorted.bam --disable-denovo-variant-discovery -c sample1.bcf sample2.bcf -o octopus.joint.bcf --annotations FRF ADP AD
dancooke commented 3 years ago

What are the criteria you used to filter reads ?

By default (for v0.7.4) reads are filtered if:

You can see the full list of read pre-processing options here.

Can I combined the annotations in one command ? eg. --annotations FRF ADP AD

Yes.

Do I have to add this parameter from the start and after the population calling ?

You will need it in the final command. You can also put it in the single-sample commands but it isn't required and will make the intermediate files bigger.

ptranvan commented 3 years ago

Oh now I understand

By default (for v0.7.4) reads are filtered if:

A duplicate (marked or as determined using Octopus' built-in algorithm).

In fact I an working with single RADseq, so by definition it's full of duplicates and I didn't used MarkDuplicates after BWA because it's useless. I guess I can stay with the octopus default parameters then or does it affect something for the variant calling ?

dancooke commented 3 years ago

I'm not too familiar with RADseq - generally duplicates introduce false positives due to correlated errors. For amplicon sequencing you allow duplicates otherwise you loose most of your coverage, but then you need more aggressive filtering to remove the amplification artefacts. If you'd like to retain duplicates then add the --allow-octopus-duplicates option.

ptranvan commented 3 years ago

Hi @dancooke

So I have tried the --allow-octopus-duplicatesoption but I don't understand why I have a genotype called for some individual even if the DP=0 ?

GT:GQ:DP:MQ:PS:PQ:AD:ADP:FRF:FT

1|1:24:0:0:1219012:75:.,0:0:0:AD,DP,ADP