hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

Large deletion not called #28

Closed carlosmag closed 4 years ago

carlosmag commented 4 years ago

Hi!

I was testing if MapCaller could detect large indels. I did a test run with genome ERR502929. Bam file was obtained after bwa mapping and duplicate removal with sambamba markdup. Follow is illustrated a deletion with >2000bp, referenced as previously validated by PCR.

long_indel_igv

Reference genome MapCaller -ploidy 1 -i MTB_anc.fasta -f ERR502929_1.fastq -f2 ERR502929_2.fastq -monomorphic

MapCaller appears to report nothing in the region comprising the deletion.

109 MTB_anc 3073228 . C . 0 . DP=4;NS=4;DUP=0;GT=0|0 110 MTB_anc 3073229 . C . 0 . DP=4;NS=3;DUP=0;GT=0|0 111 MTB_anc 3075394 . A . 0 . DP=1;NS=1;DUP=1;GT=0|0

Any ideas on how to fix this? Thanks.

hsinnan75 commented 4 years ago

Hi, Large deletions are not reported because they are considered as "CNVs" and I am still working on this type of variants. I'll update a new version that is able to report CNVs in the near future. Before I do that, I'll have MapCaller output large deletions first. I'll let you know if it is done. Thank you!

hsinnan75 commented 4 years ago

I've updated MapCaller (v.0.9.9.e). It outputs all unmapped regions (i.e., regions with read depth = 0). They are considered potential deletions. The variant type is denoted as "UnMapped" temporarily.

tseemann commented 4 years ago

This feature is very nice! Is the FILTER column meant to have UnMapped instead of . ? Not that important.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
Chromosome      1       .       T       <*>     0       .       END=14;TYPE=UnMapped
Chromosome      107794  .       C       T       30      PASS    DP=24;AD=24;DUP=0;AF=1.000;GT=1|1;TYPE=SUBSTITUTE
Chromosome      400679  .       T       <*>     0       .       END=401193;TYPE=UnMapped
Chromosome      491591  .       A       T       30      PASS    DP=18;AD=18;DUP=0;AF=1.000;GT=1|1;TYPE=SUBSTITUTE
Chromosome      575679  .       A       G       30      PASS    DP=10;AD=10;DUP=0;AF=1.000;GT=1|1;TYPE=SUBSTITUTE
Chromosome      607066  .       C       <*>     0       .       END=607536;TYPE=UnMapped
Chromosome      648990  .       G       C       30      PASS    DP=26;AD=26;DUP=0;AF=1.000;GT=1|1;TYPE=SUBSTITUTE
hsinnan75 commented 4 years ago

Hi, I added a new FILTER("UnMapped") for those regions without any read mapped.

hsinnan75 commented 4 years ago

Renamed to "Gaps." It makes more sense.

tseemann commented 4 years ago

@carlosmag in current version:

         -min_cnv INT  the minimal cnv size to be reported [50].
         -min_gap INT  the minimal gap(unmapped) size to be reported [50].
tseemann commented 4 years ago

@carlosmag @hsinnan75 can this be closed?

carlosmag commented 4 years ago

Did not have the time to test more thoroughly, but I would say yes.

tseemann commented 4 years ago

@carlosmag you should be able to close it as you created it?