lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Extract PE Reads (with their mates) supporting variants in vcf file (www.biostars.org/p/322664/) #104

Closed Pitithat-pu closed 6 years ago

Pitithat-pu commented 6 years ago

Verify

Subject of the issue

According to Biostar issue https://www.biostars.org/p/322664/, I still find that your program also report reads that don't support the variant allele.

Your environment

java -jar dist/biostar322664.jar --version 5f6b66bc05201d2d543e1b1214640dd5c84051f8 java version "1.8.0_40" /cluster_name/13.1/x86_64/jdk/jdk1.8.0_40 openSUSE 13.1 (Bottle) (x86_64)

Steps to reproduce

$ java -jar dist/biostar322664.jar -V minivcf.vcf minisam.sorted.bam
minisam.sorted.bam.zip minivcf.vcf.zip

Expected behaviour

The program to report reads that support variant allele.

Actual behaviour

The program reports all reads aligning at variant location. unsupport_reads

lindenb commented 6 years ago

Thanks for the bug report

I'm currently looking at this. To make my debug faster, can you please give me the name of ONE read that shouldn't appear here. I'm away from my lab and I don't have a REF genome here. Thanks.

lindenb commented 6 years ago

Ah! unless I did not understand the original question https://www.biostars.org/p/322664/.

. I want to extract only reads with their mate that support variant allele in the vcf

What is your query ?

1) if the read AND it's mate have the variant, dump the pair 2) if the read OR it's mate have the variant, dump the pair (what happens if one read overlap two SNPs while the mate overlaps only one ?)

Pitithat-pu commented 6 years ago

I want both if the read AND it's mate have the variant, dump the pair. It's possible because some DNA fragments are short so that a pair of read can overlap over variant position.

if the read OR it's mate have the variant, dump the pair. Yes dump the pair.

what happens if one read overlap two SNPs while the mate overlaps only one? Also dump the pair

Sorry, I'm also away from my lab. I can't give the a read name that shouldn't appear. The reads should not appear are the read with REF base at variant position. Thanks

lindenb commented 6 years ago

I've quickly worked on this without being able to identify a read failing the criteria. I've added a --pair option where the read and the mate must both carry the mutation.

I'm waiting for your input to give me the name of a read. Thanks.

lindenb commented 6 years ago

I played with your data this morning and my latest version:

This is the output of 'samtools tview before filtering'

  43354131  43354141  43354151  43354161  43354171  43354181  43354191  43354201
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
g  gtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
gc  TGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gct TGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCCATGGAGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCC GGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGA cgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGA   CACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gctgtgccgggaggagc  accaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgca    gtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgcacc    ctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgcacc       CGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAA     CGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAG    CGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAG    CGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGC  agtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGC     GGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGC     GGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcac          AGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcaccaagtctgcg      ggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAG       CGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcaccaagtctgcgagc      cggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgcaccaagtctgcgagc       ggccaatggtgaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGG    ggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgcaccaagtctgcgagcaggg    GCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCA    tgaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAG                 gccaatggtgaggctgggaatgctggccaggacgcagagtg
gctg GCTGGGAGAAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatgg       gggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGT        GAATGCTGGCCAGGACGCAGAGTG
GCTGTGCC    ggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGT        GAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGT            GCTGGCCAGGACGCAGAGGG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGG                GGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgagg           GGCCAGGACGCAGAGTG
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggc          GGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCT          GCCAGGACGCAGAGTG
GCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCT          gccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGG        CCAGGACGCAGAGTG
GCTGTGCCGGGA gagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagtg
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctggg          aggacgcagagtg
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtg                  aggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAA        aggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGC  GGACGCAGAGTG
TCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGAC CAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGAC cagagtg
cctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaattctggccaggacg      g
gctgtgccgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagag g
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAG   g
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGAC cagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCA                gaggctgggaatgctggccaggacgcagagtg
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG
GCTGTGCCGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTG

then

$ java -jar dist/biostar322664.jar -V minivcf.vcf.gz minisam.query.bam | samtools sort -T tmp -O bam -o minisam.biostar.bam - && samtools index minisam.biostar.bam

tview

   43354131  43354141  43354151  43354161  43354171  43354181  43354191  43354201
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GKCTGTGCTGGGAGRAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGT
GGCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCT
GTCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGAC
GGCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGT
ggctgtgctgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagt
      GCTGGGAGAAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGT
       ctgggaggagcgcaccaagtctgcgagcaggggccggccaatggtgaggctgggaatgctggccaggacgcagagt

as far as I can see below, I expected 6 rows of reads + the consensus line for the allele 'T'

$ java -jar dist/sam2tsv.jar minisam.coord.bam | grep 43354136 | cut -f 5 | sort | uniq -c
     58 C
      6 T
Pitithat-pu commented 6 years ago

Thank you, now your program can find reads that support variant allele. but still don't report the read mate. For example the read name SRR5229653.1238179 which the forward read support the variant allele. the reverse read SRR5229653.1238179 should be dumped as well. So in total there are 5 pairs of read to report from this minisam SRR5229653.1238236 SRR5229653.1238161 SRR5229653.1238179 SRR5229653.1238155 SRR5229653.1238185

ps. the actual objective is to find the DNA fragments at support the variant allele

Thanks a lot

lindenb commented 6 years ago

Ah I see, it was a problem with the way I was comparing the read names. I hope it's fixed in https://github.com/lindenb/jvarkit/commit/8e93bba47ac192eb96e3c11dd661ac82138e3147

A test :

$ make biostar322664 && java -jar dist/biostar322664.jar -V minivcf.vcf.gz minisam.query.bam -X c | samtools sort -T tmp -O bam -o minisam.biostar.bam - && samtools view minisam.biostar.bam | grep -E 'SRR5229653\.(1238236|1238161|1238179|1238155|1238185)' | sort -t $'\t' -k1,1

SRR5229653.1238155  163 1   43353943    60  100M    =   43354127    284 CTTCCCGCTCTGGGTTCGGCTCTTCTCTCGCAGGCCGCGTTTCTCAGCCAGGCTTAGGGGAATCCCTCGAAGCACGTGGTCCCGCTGCGCCACAGCCAGG    @B=BBB:CDACCBA>AA;ABBBB@CBBBA;AABBBA5C;>AACCC@DCAAC@CB@@DCCCA?@CBBCE=>BEEBC=@BD@DDD=EDEF>FECDCEEB??@    MC:Z:100M   BD:Z:JJKLQMMPPLLNMJMLLMLNOKKKLKKKKMNOONNNKNLNLCLKKMOPNNONNOKKNNJJLLKOMJMKMOLLPOJLNJMNPNKLOPOOMOPPLMRTNNON   MD:Z:100    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:NNNNPLNPRMOPNKPMMNOPQLNMMNLNLNOPPNPNMONPMEMNLOPQNOPNPQMNONKKOMNONKPLNOMNQPNOQNOQQOLNPRQRPQPRPPTUOPQO   NM:i:0  MQ:i:60 AS:i:100    XS:i:37
SRR5229653.1238155  83  1   43354127    60  100M    =   43353943    -284    AGGCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTGTGGCCGTGCTGTGGG    CBADADCFFCEEFDEFDF>CA@CDAECBEBE=EDEDDDDABB<ADBB@@B@ABBC@BCAAAB@@BCBB@BBBBAB?;ACBBBABABABB;CCDDCBA?>@    MC:Z:100M   BD:Z:MNPONNRSQPLOLNNLPOMOKNMLKMNLOONMKONNMJJNNLKNNMLKMNMJMKMNPONJMLKMOPONNNMNMMONLNNKKMJJJNNOMMKPRQMNNJJJ   MD:Z:9C90   PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:QORQQQTTSRMPNROMRPOQNQOONOONPPONLQPPPKKNPOMNPNNLOONMOLPNQPOKNLLOOQPOMONOONPPNPPNLOMMMONPONMPQQNOPLNN   NM:i:1  MQ:i:60 AS:i:95 XS:i:20 Xc:Z:1|43354136|C|T
SRR5229653.1238161  163 1   43354104    60  100M    =   43354135    131 CCAGCTTCAGGTCGTACAGACGCAGTCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGAC    >A=ABCACCAB>C;@@ABCAA;BBC?BBB@BBBBAAACA@CB;BBA@BAC?CCCB;ADBADDCDED=DEBBAADD?DBEDEDDDEBBBEFEDEFDACA@B    MC:Z:100M   BD:Z:JJNOTRMNNPOMOMNMMJOLNLNOONOKNJJNONMJLMNLMPLNOJMNKLNOKNNLOMPOONJJNNKLNNNKKNMMJNMOOPONKMMLOOQPOQQRONLN   MD:Z:25G6C67    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:NNPQSRNNPQOPPNPONMPNONOPPPPNPMMPQPNKONNONQNOPMNOLNPPNPPNONQPPNKKPNMOPNOLNPOQNQOOQRQOLPNPRRSSQSRSQOPP   NM:i:2  MQ:i:60 AS:i:90 XS:i:19 Xc:Z:1|43354136|C|T
SRR5229653.1238161  83  1   43354135    60  100M    =   43354104    -131    CTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTGTGGCCCTGCTGTGGGTGCCCTCG    D@ACDEEFEF>FEAEEBEDCFCF=EEEDDDDDED=BCCDBABC@CCDABCB@@B@@BBCB@BABBAC?;BBBBBABABAB@@*BBCBBBABCCDCBB>8@    MC:Z:100M   BD:Z:ONJMOQPNQPNPKNNMLNOMPPNMKONNMJJNNLKNNMLKMNMJMKMNPONJMLKMOPONNNMNMMONLNNKKMJJJNNNJNOOPOJKOKNKQPMRMOJJ   MD:Z:1C80G17    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:QPLOPTROTRPROROONPPOQQPOMQPPPKKNPOMNPNNLOONMOLPNQPOKNMLOPQPONONPONPPMOPMKOMMMONPKNPPQPMMOKNMPQLPOPNN   NM:i:2  MQ:i:60 AS:i:93 XS:i:20 Xc:Z:1|43354136|C|T
SRR5229653.1238179  163 1   43354119    60  100M    =   43354180    161 ACAGACGCAGGCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTGTGGCCGT    @@=BAB:CCABCCB?CBBBAA?CBACB;BBA@7@C?CCCB;@BCBCBBBDC<CDCA@@CC=AA@DEDBDD@A@CBDDDEDBDDBD=ECFCF@D?ACC?;=    MC:Z:100M   BD:Z:JJJOPQNPPPONONJJNONMJLMNLMPLNOJMNKLNOKNNLOMPOONJJNNKLNNNKKNMMJNMNNONMJLLKNNONMNOOPOMOMOPPMOPLMMQNNKN   MD:Z:17C82  PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:NNNQPPOPQQOPQPMMPQPNKONNONQNOPMNOLNPPNPPNONQPPNKKPNMOPNOLNPNPMPNNPQPNKOMNPQRQOQOPQOPPOPRRPPSPPQRQONQ   NM:i:1  MQ:i:60 AS:i:95 XS:i:20 Xc:Z:1|43354136|C|T
SRR5229653.1238179  83  1   43354180    60  100M    =   43354119    -161    GAGGCTGGGAATGCTGGCCAGGACGCAGAGTGTGGCCGTGCTGTGGGTGCCCTCGGGCCCCAAGAGTGTCTGCTGGCCACTCGCTGTGGCCACCACCCCT    DBADDCEEDBBCFFCEFEEEEFA=EEDEDCBCBDEC5CADEAC?CCB@DCCC@:AACAAAB@BBBABAACBCA?@BBA?CA;BCBABABBC>BC=BA?B@    MC:Z:100M   BD:Z:KMNPSRMPNMOQQPOOONONNPNLNNKKMJJJNNNLLJOPOJJNJMJONJNMOKJNNJJMLKKKMJJNLOOPONNNMJNMOLPOJJNOONKNOLPNJNJJ   MD:Z:100    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:MQORTSOQPOQRSRPOQOQQOQQOQPNLOMMMONPONMPQPMMOKNMPPKNNOMKNPKKNNLNLNMMOMOPPOONPNMPNONQPMMONPNMPNNQMLONN   NM:i:0  MQ:i:60 AS:i:100    XS:i:0
SRR5229653.1238185  147 1   43354199    60  100M    =   43354134    -165    AGGACGCAGAGTGTGGCCGTGCTGTGGGTGCCCTCGGGCCCCAAGAGTGTCTGCTGGCCACTCGCTGTGGCCACCACCCCTCTCCCTCTGCTACAGCTCC    CBB>=EEEFEDCC@CBD=CBEEBCBDDBBDDDEA=DDDDCBC@CDCBBBACBCCBACBC>CA;BCBAB@BAB=BC>AABA@C@BACAABCC??CCCB=A@    MC:Z:100M   BD:Z:MMONPQQMMOKKKOOOMMKPQOJJNJMJONJNMOKJNNJJMLKKKMJJNLOOPONNNMJNMOLPOJJNNNMJMMJMJJNMLMLJNMLOOQONLPRTMLJJ   MD:Z:100    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:QOQQRTSQOQOOOPOQPONQRQNNPLNMPPKNNOMKNPKKNNMNLOMMONPPQPONPNMPNONQPMMONPNMPNMPKKNNNNOKNNNPPRPPNQRSOPNN   NM:i:0  MQ:i:60 AS:i:100    XS:i:0
SRR5229653.1238185  99  1   43354134    60  100M    =   43354199    165 GCTGGGAGAAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCTGGGAATGCTGGCCAGGACGCAGAGTGTGGCCGTGCTGTGGGTGCCCTC    @A@BBCBD3@CC;BBA@BAC@BBBB;ACBBCBABB@;BCABA@CB>CADCDCCCCA>ACEDDDEDBEDBC=EAECC>D@DCEE>@EFEEAEEE?B>A?BD    MC:Z:100M   BD:Z:JJONQMNOMMMQMNOJMNKLNOKNNLOMPOONJJNNKLNNNKKNMMJNMNNONMJLLKNNONMNNNONLNLNOOLMNJKKNOOLOKOPPLLOMPNRNJMK   MD:Z:2C5G91 PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:NNRQPLPNNMNQNOPMNOLNPPNPPNOMPPOMKKPMMOONOLNPNPMPNNPQPNKOMNPPQPNPNOPNOONOPPNOQNNNOQONQNRSROPQNTQTOLQM   NM:i:2  MQ:i:60 AS:i:92 XS:i:20 Xc:Z:1|43354136|C|T
SRR5229653.1238236  145 1   43354308    60  100M    =   43354086    -322    TGGGCCTCCCAGCGGCTCTGCTCTTGGATGAGCAAGTGGAAGGAGTAGTGCATTTCAGTCTCATCGTAGGGGCTGGGCTCCTGGCTGGGAGGCGCCAAGG    ?BACDECEEEEF>EEFBEBEEBEABDEACDDDCBDCCCDADADBB@BABBB@AAABBAACAB@A;A@B@@@BCB@@BCAACB@BCB@ACCAA:CBC@?A@    MC:Z:100M   BD:Z:NJNNRPOLOPPOLOQNMPPQNLLKNMOMMKONLKMJNMLKMMKMMMMJONNKBLNNMNLMNNNOLMMMJJNPONJNPMLNONNPONJMKNOONPPPKMJJ   MD:Z:100    PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:PLOQRRRNQRSQOOROOQQROOOMPOOOOLQPNMOMONMMPNLONOOMPPPNEMPPOONNPPPONNOPKKNQPOKNQNONPONQPOKNLQOPOQOPNQNN   NM:i:0  MQ:i:60 AS:i:100    XS:i:0
SRR5229653.1238236  97  1   43354086    60  100M    =   43354308    322 CCTCCCAGGACGTGGGGTCCAGCTTCAGGTCGTACAGACGCAGGCTGTGCTGGGAGGAGCGCACCAAGTCTGCGAGCAGGGGCCGGCCAATGGTGAGGCT    @A@BBCCDCBB;?CBAA>AABCBB@BBCA?C;?@ABCAA;BBCABBC@CCBCBBADABDE=EBCDBAE@EDDE=CEECFEEEFE>EFC@@AEC?D?DAAC    MC:Z:100M   BD:Z:JJMKQMPQOMOMOJMJJMOMNOPOKLMONMOMNMMJOLNLNOONNONJJNONMJLMNLMPLNOJMNKLNOKNNLOMPOPOKKOOLMOOPMMPPPNRMNNO   MD:Z:50C49  PG:Z:MarkDuplicates RG:Z:run_CLL004-P6  BI:Z:NNQMPLPPNOONPMNKKPPNOPQQMMOOMPOMPONLPNNNOPPNPQPMMPQPNKONNONQNOPMNOLNPPNPPNOORQQOLLQONPRPQNQSQTQTOOQR   NM:i:1  MQ:i:60 AS:i:95 XS:i:20 Xc:Z:1|43354136|C|T
Pitithat-pu commented 6 years ago

Yes, that is what I want. Now I can find the DNA fragments at support the variant allele. It would great if the program support also indel variant. But I am right happy now.

Thanks a lot

lindenb commented 6 years ago

great , furthermore it was a tool I needed. please, mark https://www.biostars.org/p/322664/ as answered please.