lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

sam2tsv doesn't report chrX, Y, M reads #166

Closed kac2053 closed 3 years ago

kac2053 commented 3 years ago

Subject of the issue

sam2tsv doesn't report chrX, Y, M reads

Your environment

* the value of `${JAVA_HOME}`

${JAVA_HOME} -bash: /opt/common/CentOS_7/java/jdk1.8.0_131: Is a directory


* which OS

$ cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"



### Steps to reproduce
In order to reproduce, I would need to provide our bam file and we do not want to release the data before publishing.

### Expected behaviour
I expect sam2tsv to report reads for all chromosomes, but chrX, chrY, and chrM are missing in the output despite high quality reads mapping there when I check in IGV.

### Actual behaviour
sam2tsv does not report reads at X, Y, and M. Is there a reason why?
(It reports chr 1 - 19 only.)

My command if this helps:
`java -jar dist/sam2tsv.jar -R GRCm38_dna_primary_assembly/Mus_musculus.GRCm38.dna.primary_assembly_corrected_chr_annotation.fa Sample_A.bam > Sample_A_sam2tsv.txt`

Again, sorry if my question seems stupid. I am from wet-lab background. Thank you.
lindenb commented 3 years ago

thanks for the report. Please what is the output of `samtools idxstats Sample_A.bam' please ?

lindenb commented 3 years ago

ah, and please, can you please show me the output of:

samtools view Sample_A.bam chrX | head

kac2053 commented 3 years ago

Thank you for such a quick reply.

chr1    195471971   87985   0
chr10   130694993   55648   0
chr11   122082543   391779  0
chr12   120129022   48624   0
chr13   120421639   93805   0
chr14   124902244   49403   0
chr15   104043685   64831   0
chr16   98207768    70138   0
chr17   94987271    283215  0
chr18   90702639    56252   0
chr19   61431566    46529   0
chr2    182113224   89303   0
chr3    160039680   64756   0
chr4    156508116   71114   0
chr5    151834684   88958   0
chr6    149736546   75237   0
chr7    145441459   89827   0
chr8    129401213   64240   0
chr9    124595110   84755   0
chrM    16299   49865   0
chrX    171031299   39657   0
chrY    91744698    1288    0
JH584299.1  953012  2   0
GL456233.1  336933  103 0
JH584301.1  259875  0   0
GL456211.1  241735  5   0
GL456350.1  227966  33  0
JH584293.1  207968  13  0
GL456221.1  206961  2   0
JH584297.1  205776  0   0
JH584296.1  199368  0   0
GL456354.1  195993  0   0
JH584294.1  191905  2   0
JH584298.1  184189  0   0
JH584300.1  182347  0   0
GL456219.1  175968  0   0
GL456210.1  169725  4   0
JH584303.1  158099  0   0
JH584302.1  155838  0   0
GL456212.1  153618  1   0
JH584304.1  114452  810 0
GL456379.1  72385   1   0
GL456216.1  66673   28  0
GL456393.1  55711   0   0
GL456366.1  47073   1   0
GL456367.1  42057   0   0
GL456239.1  40056   0   0
GL456213.1  39340   0   0
GL456383.1  38659   1   0
GL456385.1  35240   0   0
GL456360.1  31704   0   0
GL456378.1  31602   1   0
GL456389.1  28772   1   0
GL456372.1  28664   0   0
GL456370.1  26764   11  0
GL456381.1  25871   0   0
GL456387.1  24685   0   0
GL456390.1  24668   0   0
GL456394.1  24323   0   0
GL456392.1  23629   5   0
GL456382.1  23158   1381    0
GL456359.1  22974   2   0
GL456396.1  21240   1   0
GL456368.1  20208   1   0
JH584292.1  14945   0   0
JH584295.1  1976    49  0
*   0   0   0
$ samtools view Sample_A.bam chrX | head
3cfe3b25-3608-4966-8b92-4dda119628ff    0   chrX    3823975 1   62S37M2I38M2D2M1D23M4D6M1D23M1D8M1D19M1D7M1I57M1D16M1D10M1D3M1D15M1I15M1I4M1I11M1D14M1I8M1I15M1D20M1I21M1I3M1I13M2I7M1D35M2D30M1D3M1I29M7D4M2I3M4I16M2I36M4D9M2D9M1I2M3I10M1I46M1D12M2D33M1I28M1D38M1D7M1D13M4I48M1D59M1D22M1I27M1D30M1D31M1D9M1D29M3D24M1I76M1I65M1I16M3D8M1I7M2D8M1D55M1D15M1D9M1D7M14S   *   0   0   GTTATGCTTCGTTCGGTACGTATTGCTAAGCAGTGGTATCAACACCAGTACATGGGAGCACACCACTTCGCCCAGGTCAGGAACCCCCAACACACCTGAGGGAGCCTAGACCTAGATTTAAAAGCATATCTCATGATGAGTGAGGACGCCAAGAAGGACTGTCATCACTTAAGAAATACAGGAGAACACTGCTAAGAGTTAAAGTCCTTAAAGAAAAACAGAAAACACAATCAAACAGGTAGAAGTCCTTACAGAAAAAGAGGAAAAAACATACAAACAGGTGATGAAATGAACAAAACCACCTAGACCTAAAAGGAAGTAGACACAATAAAAGAAAACTCAAAGCAGAGAATAACACTGGAATAGAAACCCTAGGAAAAGAAAATCTGGAACCATAGATTTGGCATCAACAACAGAATGCAAAGAGATGGAAGAGAGAATCTCCAGATTGCAGAAAATTCCACATAAAGACATCGGCACAACAATCAAAGAAAATGGAAAATGCAAAGATCCTAACTCAAAATATCCAGGAAATCAGAGACACAATAAGAAGACCAAACGTACGGATGTGGGTAAGACAAAGAATGAAGATTTTCAGCGCTCAAAGGTCCAGCAGGCATCTTCAACAAAGTGTTGAAAGCTTTAAATCTAAAGGAGCCAAGAGATGCCATATGAACATACAAGAAGCCTACAGAACTCCAAATAGACTGGACTAGAAAGAAATTCCTCGACACATAATAATCAGAACATCAAATGCACTAGACTAAAGATAGAATACTAAAAGCAGTAAGGAAAAAGGTCAAGTAACATATAAAGGCAAACCTATCAAATTACACAGATTTTTCACCAGAGAGAGACTATGAAAGCCAGAGAAGCCTGGACAGATGTTATACAGACACTAGAGAACACAAACTGCAGCCCAGGCTACTATGCCCAGCCAAACTCTCAATTATCATAGAGGAGAAACCAAAGTATTCCACGAACAAAACCAAATTCACGCATTATCTCTCACGAATCCAGCCCTTCAAAGGATAATAACGAAAAAAACCAATACAAGAACAGGAACAACGCCTAGAAAAACAAGAAGGTAATCCCTCAACAAACCTAAAAGACAGCCACAAGAACAGAATGCACACCTTTAACAACTAAAATAACAGGAAGCAACAATTACTTTTCCTTAATATCTCTTAACATCAATGGTCTCAATTCTACCAATAAAAAGACATGAACTAACTCATTGGCTACACAAACAAGACCCAACATTTTGCTGCTTACAGGGAAACTCATCTCAGAAAAGATAGGCCTCTATCAGAATGAAGGCTGGAAAACAATTTTCAAGCAAATGGTATGAAGAAACAAGCAGGAGTAGCCTCCTAATATCTGATAAATTGACTTCAACCCATGTACTCTGCGTTG    '%%+--**-.9A<9'&/'/144::;@:B9/++-88562)=?+%%$%'&%%''%&**,)(++032(,-/.1222**(&/$)-&,'(''&2-555444313$.*,(23636<9:849)7@AJE0**0,>A==;9621+'''+**(756##$$89380:9/)3168--80444A=&-/5...3>8>58=9739<;?=:0(:(1:9<88:<==IE?055<8146671:507<<E?CGE1.-/-&&%./166:8708=>J=600.$*=.-67?=9,%%%%?4,;7?65674*,$()*08;</*$%%#$.337459CH9))/,2-222=;>@HHAB:6('B>HB><80+5,&&%&&&%%2//0&$'(,,///7=<;=@7355@8++%*/:0*8(-+',&&/'(,-:<88%444;4')5+02$$(&&),,+?57E:;.,A>1-1-000)-%2(*)--*00/8/+(*='/%*(&''((23;79'*.2:79=A3*?K=582/CD99632@C75296126&&/8;;21@?@F>::%%;/'$;?2-5(&$*(''&&4BDE2305@CDJHA./(*3-27$$%&(*$$%('##&&'((((+,**;<=::7&%&34DD>=,7%%3/###$%&%23%.*%$$%1$(%#'(+&+(&%%%$#.>:>;?><-)*(&%''&,./31()'%&,//877563,8'+/,78899?<6:>B%961.233B?C>7.%#%'51,120-:<13(-.3.2(:=ACEH@8C574:;>;F>GB00.150,)+(,0>4<B=>426>,#/H?.*.,,?ECG<>ED>546;5=B3222..1468:<2('-(9*?=887<84260(%%%%&&;=@CDA0,-54,$=.,4C?B>..-5:267?:5'*)-:<DHE80<;D00<;657899=44)*23447625=BB=?A:7746758;4834,,-$$++*,$%%7717A>5BEA@>:;92+,.'22514<733771.6899),$,(33:BA@200764?*2'$'+.//0'1&0(9+),/222=3-569=<632/9<BBCB/A8%%%/FB@>900E:443470<0((41<2;<?77<<73,2/7+279E97148;;A@AC3/>D;<@D9::?CK6(&;9@:8772*927>;E4492/%%%(()*+*--$$-9?KKP>55%$$'-')&&+)&**(1--9:;>-4..C9449=;>DBH92?;:B=;<;;52<02.//$"##%*65;G:485/-*)'-'*40,./,01'(**+)69;@99:A?/,..1&%%?AFE7--,(0/11-.*&,+)<@A@D;<B<6:2B+2.'')*##&%$&2;<@;A9:8><A@AB>D@=4,-80DC2=/1:GGHH=?8554733'(.3:BA*-,73'*//''%(&(59:5;7=:(7552/0-0*(%&.<8568@74+,114/00/+,$    NM:i:174    ms:i:847    AS:i:847    nn:i:0  ts:A:+  tp:A:P  cm:i:6  s1:i:47 s2:i:56 de:f:0.1033 MD:Z:7A4A0T0C4G0C0A5G6C2A1A8C26^TG2^A6A0T11T1A1^TAAC6^A23^A8^C19^G32A5C25^G16^T0A9^T3^G23A10G0C9^G37^C0A0A2C3G9A6G15A2G7G8G2^A4T30^AA30^C24C7^AATAGGA5T17A8A6A0A13A1T2^TGAA4A4^CC21C26C15C2^A12^CC21A12A13T12^G30G7^G7^C32A0G27^A12T1C16A20C6^T14G23A10^C30^A2T28^C9^T29^AAG26A10A3T2T14C22T14C2C15A0G6A0A1C37C15^AGA5G3A1A3^CC8^A45T2G0A5^A15^C0G8^C7  rl:i:692
534c624a-a491-4f0e-ba9c-9f688c0d52c9    2048    chrX    4899368 1   320H11M1718I6036N22M1D13M1D16M1D11M1I4M1D49M1D2M1D12M1I24M1D49M1D13M2D10M2I28M2I39M1D24M2D17M1I8M1I14M2I3M3I2M3I3M2I2M6I3M7I1M2I8M2I15M1I32M1I3M2D41M1I16M2D11M1I11M2I24M1964H  *   0   0   GAAGCCTAGGGCGCGGGCCGGGTGGAGCCGCCACCAGTGCAGATCTTGGTGGTAGTAAGCAAATATTCAAGCAGGAACTTTGAAGGCCGAAGTGGAGAGAGGGGTTCCATGTGAACAGCAGTTGAACATGGGTCAGTCGGTCCTGAGAGATGGGCGTGCCGTTCCGAAGGAACGGGCGATAACCTCCGTTGCCCTCGACCGATCAGCGAAAGAGTCAGACGAATCCCCAGATCCCGGGTAGCGGAGATGGGCGCCGCGAAGGCGTCGGATGCGGCAACGCGACCGATCCGGAGAAGCCGGCGGGAGCCGGGGAGGATTCTCTTTTCTTTGTGAAGACAGGCGCCCTGGAATGGGTTCGCCCGGAGAGGGCCCGTGCGGCAGGAAAGCGTCGCGGTTCCGGCGGCGTCGGTGAGCTCTCGCTGGCCCGGCGAAAATCGGGGGAGAGGGTGTAAATCTCGCGCCGGGCAAGCCCTTCTTATCCGCAGCAGGTCTCAAGGTGAACAGCCTCTGGCATGTTGGAACAATGTAGGTAAGGAAGTCGACAAGCCGGATCCGTAACCGGGAATGAGATTGGCTCCTAAGGGAGCTAAGGTCAGTCGGGCTGGGGCGCAGAAGCGAGCTGGGCGCGCGCGCCGCGGCTGGACGGGCGCCGCCGCCTCTCCCGTCCGGGAGACCCCGTCCTTTCCGCCCGGGCCCGCCGCTCCCTCTTCCCCGCAGGGCCCCGTCATCCCCGCGTCGTCGCCACACTCTCTTCCCCCCTCCTTCTTCCCGTCGAGGCGGGTCGGGGTCGGCGCGCGGCGCGGGCTCCGGAAGCGGCGGGTCCAACCCGCGGGGGTTCAGAAGCGGAGGAACCAGCGGGCCCCCGGTGGGGCAGGGGGCGGAGCCTCGGGAGGGCCGGCAGCGGCGAGCGGTGGCTCTGGACGCGAGCCGGGCCTTCCCGTGGATCGCCTCAGCTGCGGCAGGCGTCGCGGCCGCTCCCGGGAGCCCGGCGGGTACGGCGCGGGTCCCCCTCCCGCGGGCCTCGGCTCACCCCACATCGCCTCTCCCGGGTGCGTGGCGAGGGCGGGCAGGCGGCGTCGCCGCGCGTGAGGAACCTCCGCGTCGGTGTTCCCCGCCGGGTCCACCCCGGGCCGCGGTTTTCCAGCGCGGCCTCGCCTCGGCCGGCGCCCACAACATGCGACTTAGAACTGGTGCGGACCAGGGAATCGACTGTTTAATTAAAACAAAACATCGCGAAAGAGAGCCCGCGGCGGGTGTTGACGCGATGTGATTTCTTATGTACTCTGCGTTGATGCCGCAGCAAGCAGTGGTATCAACACAAATGCTCAGGGCCTCAGTAGAGGAGACGGCAGAGCGGGCGGCGTCCACTCGGCCCGAGCGAAGCAAACCACCGCTCCCCCCTGAACCCTCGGCTCGCGCGCTCGCGACGGCGGCGTTGGCGGACGGATACGCGAGCGGCGCCTGTGTCTGGCAGAGGCCGGTGTAGAGAAGAAGAGACAACCCTTCCCCGCCGCCGGGATAATCAGAATTTGGCCGGACCTTTAGTGCACCTCGGGATAGTGAGGAGCCCGGCGAAAGCACAGACTATGTAACTGAAACAGTTAATAAGGAGACTTAGTGATTTGGCCGTGACCCTCCAGCACAATGTTCTGCAGGTCCAGTTGGAGATGACATGTTTCATTGGCAAGCCACAATTATGGGACCTAATGACAGCCCACTTATCAAGGTGGTGTATTCTTTTTGACAATTCATTTCCTACAGACTACCCTTCAAACCACCTAGGTTGCATTTAACCAAAAAATTTATCATCCAAATATTAACAGTAATGGCAGCATTTGTCTTGATATTCAGATCACAGTGGTCTCCTGCTTTAACTATTTCTAAGTTCTTTTATCCATTTGTTCACTGCTATGTGATCCAAACCCAGACGACCCCTAGTGCCAGGGTGCACGGATCTATAAAACAGACAGAGATAAGTACAACAGAAGATATCTCAGGAATGGACTCAGAAGTATGCCATGTGATCTTCCTTACAGTCAGAATAACCTGCATTATGGCTGGAATAAACTTTTAAATTACCTGTTCCTTTTTTGAGGATTTTCTTATCAGCTGCTCCCCTATCCAGACCTCATCTTTTTTAATTTTATTTTTGTTTACCCTCCCTCCATTCATTCACATGCTCATCTGAGAAGATAAGTTCTTCCAGCTTTGGACAATAACTGCTTTTAGAAACTAGCAAAGTAGTTGCAGGTACAGTTACCCAAAGATTCAGAAATTTTTTTTTAAAAACTGGAGCATGT    25:8<=/+-/501*+-2.-*)<A;E0980/0/%%%%,/(&&*&3;<:56@74++222+.8?>D@AA5A>2))**&$'##$$'').900&%)400*&+&$$&/00%&/2777287/'8'67@DB;7===996<9<,.)7'/%%&<><59<A98-/+*60@ACQK&B--.)5#$%&'9//)''%('7790**'1234.%#41..1-)''/.21,$324%&&&''.((,((%%>$$>;(*3..7631(+,((*-''**-,0))1/1+%2&($/(&((*.-113>B=/::&&;%(/,67*<<E:$$%7+2)+*+/6*&5%BD?E;KKDJB;;G:07/./&$$&(&(2336286*'&*:CCB2222,*.$))5188:A?;$$'((+%,*(;84/1%$%&+42:2'#-0.*$1))0+/'218&8;?AC8'('%%##..,)&;25@;CB11'B=>=:;=3,*))2.6==1.($##$$$$##$#%,448-510002F>?+8+,=@1115--?C5*<'155665?;+,159:;3.''+/9;<14?92,3&&%/.2;@>>;//3-..%%&')<7<1116377'-;9&'*''106'4:5..8-%=%2&@+,/41346F>1&&&33732$;=864A88&3)17;765556?BED;--%(,,112*(,,-/3$05..$$;)-</8<=>A3789=@A?225344$*33556<:&8;8;@;7==1217,&&(57>@?ABC7%;<B@A838?A7<@554%0'>?:2ACD<=>;+*6/20--(*;<>60=%%,+&&)144/(+24>8636&3-,.--''(<::$8+)$$&+1.*),;11@.*%),101245BH2<$)'&##%(/50%:<>68?37E67689/5111,&&&3517*6&(%($$.9*.=#&1(766-,)6)*,(($.')),,((*()29;22))/.&'$'<98:<<=<>B?@B;<77/*4-7799446--0563.&$-/089<@@7=<(+4)'444974/05,*.%-..)*)<78<9546*0))$&&%./23014+8,7752#22.9++(0)*(302*A>B=@=;;)*$.,1&&'(&&,-)+)(%'''%%4,200%&%+/%,56B>A2'(18/397B@::-+7=>=<?236)6875%'%$%29>?MGHD*5$-((...'+?C;<=?@=*<:=433445/40/'&%%23/111??G:78:7:0*-/5??728@42-<878><C>400--++9;78:.&431:'%%((()+$#.880;564%%%)--99?5<;5:/:;7>8==2(($%..-/=?@F>-.A=87**./$%%%$&)&%%*0)5477:0&%%+%&###$'/.,'###*%$#"#($(,(8-,+-66-,)/$,5/67)-++,%'(2;8...%%;++$%+-13'56%236:<24.,403658551,.%.961'%'-%+=?0).8455$588845?D<:988==2425323%477:89;<>?5:/)/1.,,1'(:;;88=4())3+437884++7--4?B<<=9=B7<EGFBB+/A:783+D:----5C7;77<=:$%#%'$''&('&'+5,0<.$$%0---93:976//&&'''+./-..832511((%696+8501@3:=<998,3$$*#''()30(4996,-.4655<>6F;><;88;79?@;88..((.:''<A>??87.-478;88:@HGFGA::94--0=CC?B7;67-//46D?ADCA>>+(1(444(#%%-&<::8@>?/>/9<FF;GFEB<6701;7=:;@AB*-27::=7667210292.=668A@@<<8<II74433/3/#$42%%<6>BH@<?=%%>?29AA=?F922EDJJEG7..;;>AB);7>@?@<9:8/'',#)/*//5<A@A==7:=CDLNC=::EFKJ?5:<?;DE>::3-;77;<>@G:@<A59=??:349555'':64300/((4(-3()((%$(:511)-/)4+129;48:=78AILRB:=<866700;57(,,221/'/,&%$4++:<6;%;DGK347?;1<5&61277))'''++%'$%%&)&#$%*-../BE>:?=@7++44<;;;8,,*:=89GE<GA?788./5/0006,&79?B(.'7=A;>))(6(ACFF;=:%-3'%'%')/:=>>=:((:978++(69@;FDBFBF>?<=6.-9:==<,7;4../'>2647,/7CA>;9=<4/3:;<@?>??B;=+:86(4%9=8?;BF3CE=<???353,06)6/384$$&3((')-))%$$$(<3+)(''0$%%)&#&#$&++$.//<93;9858('+-+-2BBF????@GC?;89513-3>??>    NM:i:1829   ms:i:254AS:i:192    nn:i:0  ts:A:+  tp:A:P  cm:i:10 s1:i:97 s2:i:124    de:f:0.1498 SA:Z:chr3,135438578,+,1661S1496M25987D1427S,58,127;chr16,50400958,-,38S1278M18I3250S,60,120;    MD:Z:12A0A0C3G10A3^T6T0T5^A1A7G4T1^A7G5A1^C14T6C19A7^A2^C36^T4A36T2T4^T11A1^AC5A13T2T5A15T0G29G1^A0A5A12T4^TT4A14C8A10T10T14C10T0T7T12T1T4A5^CT3A5T7A10A9T13A2A1^AA0A6G15C5A12T3    rl:i:83
6b135522-80c9-4ab9-bc80-cc473159f3e8    0   chrX    4911058 1   67S25M1I57M1D5M1D24M1D11M2D18M1I30M1D7M3D101M1D5M1D6M3I2M3D4M8D1M1D6M2D6M2D11M1I8M2D7M1I8M2D42M2D54M2I15M2D3M4D3M4D4M3I7M1I5M2D6M2D14M3D1M1I2M2I7M2I4M4D3M2D31M1D27M1D6M2D3M1D12M1I21M192S  *   0   0   TAAGCCGGTGTGCTTCGTTCGATTCCGCGTATTGCTAAAGCAGTGGTATCAACGCGAAGTACATGGGATGCTCATGGATTAGCAGGATCAACAATTGTAAAAATGGCTATCTTGCCAAAAGCAATCTACAGATTCATTGCAATCCCCATCAAATTCAACTCAATTCTTCAGCAGATTAGAGGAGCAATTTAAATTCATCTGGAATAACAAAAAAACCTAGGATAGCAAAAACTCTTCTCAGGATAAAACCTCTGGTGGAATCACCATGCCTGACCTAAAGCTTTACTACAGGGCAATTGTGATAAAAACTGCATGGTACTGGTATAGAGACAGACAAGTAGACCAATAAATAAATTAATTTAGCAGACCACACCTGGTCATGATCTTCGGCGAAGAGACCAAACCATCCCAGTGGAAAAGACAGCATTTTCAGCAATTGGTGCTGGCACAGCTGGTTGACCATCATGTAGAATACAGGTCGATCCGCCGCCATCTCGCCATACTAAGGTCCACCGTCTAAGTGGATCAAAACCCTCCCGCTCAGGTACTAAAAACATAAAGGAAAGTGGAAAAAATATGAAAAAATATCAGTGCGGGAAGTTCTGGAACGGAACAGCAATGACATATGTGTAAGATCGAGAATTGACAAATAGGACTCGTGACTCAAAGTTTCTGCAAAGACAAAAGACACCGTCAATGGGATAAGACCACCAGCAGGTTAGAGAGGAAAGGAGAGATAACTTATCATAATGATGTTCACGTCGTATAAAGTTGAGCTAGATTTATTATAGAATCAAACAATAAATTATTAATAAATAACTATTTTATTTAAAACTCAGATCAATTACTGGTTGTAGATCGGAAGGGCGTCGTGTAGATAATGCGTGGCG  $()%%*&&)%($$*,'0)-6'&;<'*%(*5,876:BB>;+004:;7751>0('&#&'%(((%6569;@??:;:0111(%$&&&%;4;69+'&&$)-41@FJBB,,+-?>;942(''&(*&/..023-)'(--==9-94')39''/''1781-170=.+702665:8:972%))1-/./01*5)*.-.+4*878:<:2/9/>5CA;?79?IH?<12;;)$2;<42002A//*.--68;+30)-.6;6*-..,51&';*21825797..-++%((-.099'(89:2.//1/10)(+.(%%&//1>2B60'.1+-)*)+3/.51047331#%.'('9776&&&&2,-3*%%%%%%('./%&$*'%$"$%"###&#%%(++66((((34.14-&&&$$)"'&##%%%&/2&&7=-348:=+'$((*)&0159:0:96220,,0-:/@=@1((*/5?:679883,.,67))(&)%&($$%$&$*+51%%(+)*#$$%.0+*###$$$%%&&&&&')##&#$%$$&#%$$*4<++,,.221$,%(%(.$%####((.**%%%'''&.0,)0&-#&,/.(&&#$$$$%&&&%%%#$#&,(#()'&$'$(.(,-*0-.-2,('$$)'''%#&&&#$%&%)*)&.5,$+///4>:00:,%%-4,,+$#$$'--200*(&.4&%#%**')(''3)%%-.00006960$$)1*#+0/5/.0**6&*/504.'&$&&-/62(**)(()%+$,(*+0,1&$%$*,3%%11,%%$*$%(&',*&5('$#&')&(%:?>8::5..(&&&%'''$('%&($))$%&01113/**$%.)*,4,(&''$#$#&,'&*1/<'&#%'(%'(?2.,,.&46*+*)&'&/004%'#"$'$"#&##&$'#$$#  NM:i:139    ms:i:275    AS:i:275    nn:i:0  ts:A:+  tp:A:P  cm:i:5  s1:i:41 s2:i:41 de:f:0.1518 MD:Z:13G42A25^A5^C15A1G0A5^A11^GC34T5G7^A7^AAG11C33A43C11^G0G4^G4G3^ACC4^AATGAACC1^A6^TA6^CT9A4G2G1^TA15^AG34A7^TT3G1G0T0A1A5G1G0C0A7A0T0A0C0T0T5C0T0T0G10A1A14^GG3^TTCA1A1^AAAA2A3A0C3G4^TT3G2^GA8G0G4^GCC5G5G2^ACAG3^AA2A3C0T4A11G1T1G2^C23G3^C2A0A2^AA3^C9C5G17  rl:i:219
5c99b169-6020-48ca-af7c-1664fcbd0672    0   chrX    4940887 1   57S6M1I2M1I25M1D12M1I17M1D2M1D4M1D17M5D4M1I51M1I6M1I24M1D39M1D8M1D12M1I4M3D2M2D7M1I10M2D7M1D9M2D7M1D12M1I10M2D10M2I15M1D9M2I17M1D22M2D21M3D4M2D8M1D3M1D3M2I24M1I4M1D3M1D3M1D20M3D18M2D7M1D15M1I9M2I6M1D5M2D4M1I24M7D10M2D6M1D19M2D28M2D5M3D14M1I13M1I22M2D8M1D3M1D8M1D1M1D11M1I4M1D5M1D3M1D26M1I11M1D19M3I4M1I10M1D10M1D5M2I14M2D25M2I7M1D8M1I4M2I4M1I15M4D4M3D5M2D2M1D6M2D6M1I42M3D4M1D20M1D13M1D23M1I1M1I3M1I10M1I36M1D9M2D60M2I3M2D43M1D16M13S   CAATGTAAACCGTTCAATTACGTATTACTAAGCGAGTAAATATCAACATAGTCATGGACAATAGAGAAAGACCAAACCTGCAGATAGCAAGATTGATGAGAATGAAAGATTTTCAACTTAAAGACGGCCATATCTTCAACAAAATAATAGAAACTTCCCAAACATAAAAAAAAAAGTGTTCATGATCATATAGAAAGCATACCAGAACTCCCAAATAGACTGGACCAAGAAAGAATTCCTCCCGACACATAATAATCAGAACAACAAATGCATAAATAAAAGTGAATATTAACAATCAAGAAGAAGAGTCAAATAACATAAAGGAAACCTATCAATTACACAGACTTTTCACATAGAGACTATAAGCCAGAAGGAAGCCTGGACAGATGTATACAGACACCCTAAGAGAACACAAAGTAATCTGTGGCTACTATGCCATGAAACTCTCAATTACCATAGATGAAATAGGCATTCACACATTCAAGCACAGTTTCACAATATCTTTCCCACAGTCAGCCTTCAAAGGATAATAACAGAAAAGCAATACAAGGACAAATCACGCCTAGAACAACCAAGAAAAGTAATCATTCTTAAAAACCAAAGAAAGGCAGCCACAAGAACAGAATGCAACAACAAAAATAAAGAGCAACAATTACTTTTCATGTATCTCTTAATATCAATGGACTCAATTACAATGAGCCTTAGACTAACCAGACTGGCTACACCAAACAGGACCCAACATTCTGCCTTGCCAGAGCCATCTCAGGAAAAGGCAGACTGCCACTCAGGTGAAGGCTGGAAAACAATTTTCAAGCAAGATGGACTGAAGAACAAGCTGGAGTAGCCATTTGCCAATATATCGGATAAATCGACTTCAACCCAAAAATTTATCCAAAAACAAGGAAGGACACTTCCTGCTCATCCCAAAGGTAAATCCCCTCAAGAAGGGAGACTCTCAATTCTGAATATGCAAATGAGGCAGCCATTCAGTTAGAGACACTTTAGTAAGGCGCCAGGCATACATTGCACCTCCTGTGTGATGGGAGACTTCAACACCTGCTTTCTTCAAGGACAGATCGTGGAAACAGAAACATGAAAGCAAGGACACAGGTAAGAGCAACAGGGATTATGAAACAAATGGACCTGTAGATATCTAGAACATTTTATCCTAAAACAAAAGGATATACCTTCTTCTCAGCACCTCACAGGACCTTCTTTCCACGTGATAATATAATTGGTCACAAAACAAACCTCAATAGATACAAAATATTAGAATTGTGTTGTATAGATGT    #%#$#$&$#$)&'-91'+-.398898$+%(&%#$$%&%%"$%22,*%%$($'*#'.:>;=3&&&)$967480.B@3455#(%(3**#$((')3311+++')%''',.,,8::64/($$%+7-/'$$%%%*..279638+$*+6300,&'%)*,//1'*)87523/18301***%&$%"%&'),++&7&*%#&%'%$#%%%%&&0327<<4:553-&&&6:;79:/.**(766+('(-0/:)/)&+--*,,-82874:>589634*)-('#%&%*,*//4(%()'%'%%&%+()-$$#&'&/.%1+&$*('%'$--,,-*+.-$$''%./5839++(+'-&)),&))6754**&$$'(-445)&&/366724-%)$&)56??>.&&&)'%)&%&(*+0/-/+(%21-$%$&&())*$&&&%,%($$%'//-654....&'%(,+058*0.&$&,--,&'*$+$#$#$'%$$$#'''')$))($+)#%#%',23..'#"'.028112*2/%%%##',++''+(+,,*4//0;:9(%%*97.(*12-*)$$**+$$$',/7'$#'*.3/7=''(%%77134(&39>>:;536)%%?<44167=4/'.++,,.2%&&(%#+0+112+(-$$&''$+0**<?B@=?=))%$(13#(+-.*#&/2/74/../87/-,+000/3<B>777.#%352-//648%%&())))))3635:()341*),).''%%#%&)*''-00,-,6+32:5'##%%$(%''./&%,-3560;)4=A43*'%$$%,-,+&&%/*7(+&0/,+(3:821.-(&'**)7;6942+**)%*-.//-(%&'&()0/.#&&*+).42454$)+%#$%52/.&147+/34A=:8613***(61222;8((#.210((8881)()*0'&().++&%.34*),%#(1*%%31'&$((&%&())3*)'(),*.(%-)%#,*)/(94538-)..30.'#'#++*(%)-01&((7982+&$$)+*+5620452(*+++#%#$%%*&)6779:;==9--:;?'$$$$#'&#-&)*''$"#$$%$$###$##%(3.8.;6.1//,*,3/0*2+)**$*+)*($%#&$%)'(..+5521/+))($(&('&.%--0%%8%;8&&''('%%,0335/(&'&$(*591*>+./069:<=<85114,)''+45733+636/'(/0030<>;-<,,//+,+4%&%8:@01192001)*$#$1'%%&&$/,.214*++*%&'1.)#&*)-..66*)..6332;<87,,--$*"""#$%$,&#$%"#&($$$    NM:i:301    ms:i:380    AS:i:380    nn:i:0  ts:A:+  tp:A:P  cm:i:4  s1:i:42 s2:i:42 de:f:0.1954 MD:Z:20A0T0G4A0T1G2^G1G21T1C3^G2^C1A2^A0A16^TTTTG2G13T0C4G3G1G0A2C0C5A4C1A0G4C26G0A4^T39^C8^G1T0A11G1^AGT2^GG3A8G4^AT7^C2G6^AG7^C4T7C9^GA25^C24T1^C0C1G1C0C0A9A2C0A1^CC21^GGA4^CC0A1A1T0G2^C3^G3A2C2A1T1C0A4T5C5^G1A1^C3^C20^ATA4C9A3^GG2C0A3^C7A1T0A6G6C2C2^C5^TA7A20^CAACTTT10^TT2C1G1^A9C7C1^TA0A19T5C1^CC0C4^AAA0A2A1A10A16A11T3^TG3A2G1^A1A1^T8^A1^A6T4A1T1^C5^A0A2^A30T0A5^A10G0A9T6T4^C0G2T6^T8G1C3A4^AG7G2G6A1T7T2T1^A6T5T6A11^TATC4^CTC5^AC0A1^G6^CA9A3A10A0T1T1A1A3C12^ACA1A0A1^A0A1A0G16^A1C0A5A0C3^T9A19G9G1A1C0T5A0A0G4T11T3^A0C8^AC50T0G1T2T6^AA1A0T3C2C3C1A12G0G6C6^A8G0A6   rl:i:397
87867318-159a-4bed-a8be-0cbae0386f96    2048    chrX    5150991 1   96H7M308I14508N8M2D2M1I4M1I7M1I8M1D14M1D10M1D23M2D14M1D9M2I1M1I20M2I23M1D20M2I7M5D52M1I7M1I3M1D11M1I27M1D9M1I7M35H  *   0   0   ATTTTATGTTTTTAAAAAATACAGTACTCTGATTCCATAGCCTCCATAGCCAGAAGAAGGCAGCTTATTTCCCAGCCTATTTCTTCCAGGCAAATCATGACTCCAGAGGCCCCCATTTACTAACAAAGAGGCAAAAGAAGCTGGGATTCGGGCAGGACTTTCCAAAAGCCAGCCCGGCCATTCCAGCCGGCTTTTGACGCATGCTAGCGCCACAGCTTCCATGTCGCCTCATCCTGCAAAACCTGTGCCATACCTCTAATTTATATTCCACTTCCCCTATCCACACCCATAGGTTGGACACTTTTTGTGTTGCGGGTTTTAGCGAGATGAGATTGGCAGAGCTCATTGGCTCCTTGGTAATTGGCGAATTTCTCATGGACTACACTTTCTACCTCCTCTAGGAGCCTGTTTGCTTTGGCAGGTCTGGTTACGGGTCTAGAAAGGAAACTATTGGTGGGCCGTGAGCGACTCTGCAAAATTAATTTCCTCTCATGTAGGCATTATTTCAAGAGGTGGGGCTGGGAGTACTACTTCCTCGGGTGGGGCAGAGCCCTTCGAGGTCTGAGGATTCCAAAATTCTCAGCTTCAGCATGGTCTTCCACACATCACACATCC )GGFD=?AEEIB6/8:25/++'(&%%%*50'&2539(%%%/0>:?/.-%%%,+;209(()'**89:6EDH9:8559<95FGI<KH/?9:27,93,)-)--79%5-+#(&&&%%&&--+#$&'.%,.-/$%*+3;1&'(&%#())7)?38'(&'&56>9>></52,$$$##$&463,488066+6'$"#$$"#$$%)%$%'%(##%$"$#%&(*,+(((%101120'%%%%%#$(%%(-+,(0)%%%#%)$##$%./)&&%2@<56==%*##*+#%$$&%,-.$&))*($*$0665-*$21%%&%%%#)(&(/)$2=:<8,$+1'31@--*+-590*)+'''43==<:&%'(-01288=85//'*+*-,-216+65122860)2469;<65/%&)()20.+,4'0$$/.::<())*+(+()-3=>662:@2002A=@7222-%%$32.,2++,32<87A22(-.-1-069;<220362/9:9615?*,29$-&'&#%*)/0::,551>?>7206=<<>98:9%($%-.79.''*))62%'4,+,60//(&)1/''+#+'&&'$#%#*6#11&3*)$$-11/+$.59-94'/-((&)&&&#,*))$)&/$$%''012 NM:i:354    ms:i:169    AS:i:109    nn:i:0  ts:A:+  tp:A:P  cm:i:6  s1:i:54 s2:i:75 de:f:0.1206 SA:Z:chr15,19058385,-,35S289M1D422S,0,41;   MD:Z:11C0T2^TT9A4A6^A0A7A3G1^G1A8^C23^CC14^T20A32^A27^GGGGA26A1G13A10A8^A0A27A9^C10C5   rl:i:46
b3983ac0-3873-4d4a-aab9-267161d46c0b    0   chrX    5699957 22  60S27M2D23M1D11M1I4M1D21M1I3M2D9M1D14M1D3M3I27M4I3M2I5M1I2M1D50M1I31M1I28M2I3M1D17M2I26M2D15M2I2M1D27M2I9M2D5M2I20M2I10M5D4M2I40M5I3M3D14M1D5M1D17M1D4M1I21M2I3M3D86M3D28M1D15M2I7M1I40M3D9M1D12M4I12M1I32M1D12M1D8M2I1M1D8M5I7M1D8M3D4M1I6M1I25M1D34M1I16M3D7M4I23M1I15M2I5M1D2M1D14M1D32M1I7M1I4M1D32M1I37M2D1M1I17M1D24M1D9M1D23M3D25M1I38M1D19M1D16M3I1M4D24M2D20M1I11M1D1M2I24M1D22M1D16M1I2M2D2M1D24M2D13M2I1M1I4M2I1M2I6M2D8M1D1M3I10M1D10M1I11M1D10M1I1M1I18M2D48M1D4M31S   *   0   0   CTGCGTACTTACCGTTCCGGTTACGTGTTGCTAAGCAGTGAGCCATCAACATGAAGTACATGGGGGATGCTGCATTGAACCGGAGTTCACAGCGGCTGTGCGAAGACTGGGGTGGCTCTTTAAGGAGCACCTGGCACACTTCAACATAAACAACTTTCGAGTTCCTGCAGAACCCCACAGCTCTCGGTGGGTTTATGTTACTGGTGCACGAAAGAATTAAATGCGTGTTTGCAGGAGATGGCTCCGTGGCGGTTGGAGAAGACACTGGCACAGCTTCCTGGGTTGAAAGTGTCTTTCGGAACAGCAGGAGTCCTCCATCTGATCCTGCTCTTGCTTCCAGTGTGCTTAACGCAGGAGAAGCTCAGCTGTACACGGAAGACTAGGATATGCTGGGCGGCGCGAAGAGTGCGGTCTGTCTGTCCGTCCTCTGCGGAGGCGCTGGAACTATTGCTCGCACGTCCTCGCTGTCCGCTTTTCTCTGAACTGGCTAACGACATCAGACATTCTGTTCTCCAAACCAGGTTCATCACAACTGCTCTCGATGAGCTTCAGGTAAGGAGGAAAATTTATATACGTGAAATCAAACTTCGGAGTTGGGCAGCAGCTTCAGTTGCTGCAGGTTGGGGTCTTGGGATCCTATCATCATCCGCTCGCCTTCGGCCTCCGAAAAGAGTTCACCTGCCTCTTCAGTCCCCATTCTGTTCCATCTGTAAAATGGAGCCTCCTTTGGAGGAGATAAGTTTTGTTCCTGCTGCCCTGTTACTGTTAGGTTATGCCGTTCAAGAAAGATCATGAACTAGATCACACTGCTCAGTGGTCCCATCCCCCAATGACACAGGGCCCTCTGTTGATGTTCTGAACCATGGATGACTACAAAATGAGGAGAAGCTTCACCGATTTTTGAATAGATCAGCCCCATATTCATAGCAGCAGCATTTGTTTGTAAGATCCTGGTGAACTAAAGCCATTCGCAGAAACTACCATTGGGAAGAGCCAGTAATTGTTGTTTCAAAGTAAGGTGCAGAGACTTGGAGACTTAGGAACTGAGCTAAGACTATACACTAGTACTGCTGAGGATTATACACTGGTTGCATGAAAGACTGAGTGAGGATATTGGGTAGGAGTCTGAATAGGTTGAGTAATACATACTAGAATCCATTAATCGATAACCATTCTTAATAAAAATATATTGATGGCTTTATCCTTGCCGTGATTGGCTTGCCTGAAATGTTTAACTTCTTCTCTTAAATGTTAAGCCACCTGTGCATCTGTGTCTGAGCTGGAAAGTGATGTACCAGTTGTATTTGTGCGCAAACTACAGCTTATCGCTTTGGTGAGTTTTCATTTTTAAATGGCCTTTCACTTCCTCTACAGGCGTTGTCAGGAATCTCTCTGAGAATCCACAATGCATCTCAACATGACTTGGTCTCACCTTTTCTCCTTCTCATCAACTTGTACCATGGGGAAGATCTCAATAGCTAAAATTAAATCTGTATTTGTAGTCAATTTTTAATTTCTTTGCGCATGGCTGGCGGCAAATCCATATATTGGGACGAGCAGTCACTATCCATTTTGAGATATCAGTAAAGGTCACAGCTGATAAATTTTCTCGTGAAACAGAATTCCAATTGTTTTTTCAGACATTCTTTTAAATCTTCTGTGAAAATTACTTCTGAACCCCATGTACTGCGTTGAT    #%$&#$$""####62;$?%',-)**%$//*&.-5)'&*''%&'(667.%..)))%+**78763<>14*)%%(2)66((-)*-/$;746%%%0(),02.+*''&%&'$*$&%%*(,616<=5//)*%$$$))*+//0476798((%$%%,(&&'$4555&(5.5(*)&&(.$$./.)'$&'55//<,2557@PP@98949321%*+'##"$++('$#$%%'&$#*';6?>>))$'*#)4&&-)2:>=<=9$$)04**,(&'('$%%+00))*(*##&%&'(.7>?6200).)&3%)-$44,5++,&&..$)004=49)&=&))...54330/-"#&(#'&*6'0&&&&+)&&*),4$(%$/43>/9::,$$-.013)(),.&$56/''3/.02-6))-*'&&%%%''-''')1;32)=-566<://34A@<012/+<115484-,.*&%%$*-2%&%.,--0)6##1&((+'$&')/*091&%(($(&%()028<<;8;%'$%$#'-,-+,,*14765774%>3<>5/9%$&&%&%*(2(6:8334$89-=&2366714/+-#)20396/..)),,&&/52-B975:;::.-):50$%$'%#%$$#&)-(108&02,---1/-(&8.01/..0C>%%)(%6;8=C7%%6&%*,%45;/21.-.5)2),,&&)$5'*--0276754;90.)'&','+<BAD<7/05784<5:567<:::4&#'(-*/*-;12,'(/2?562:,4>>-$&01213@98957-23&,)1-,&2*11'%$$$(%/2-(&,+)))%5#(')-7401..-$#$&*383:44,,.339>689><<=72<<996),()&:;?7?6+32%%55=<5++**&+#1:@B553/*$#()++$$%+3$%$%$#$%(%%&6.ACD<5*).&&'%'%#$**++265:=D675.16+46.22>CM8220(*'63.(()/..82@>77><*555./1("$(&,+'**)++,*+)%,&*4,5==;:<:955&&&0DE;>C4)&&&&)).(&''#$*%)&636/4(((+&&':433)-$#''+45>8/./+#%%)5007/-(+&(9&:>..035&)47:A61.**)*(+*30,','%'22/((&&,)(,+$2$512$'.202*9.75-000,11072230,-16((.../5423,'571-+1>?>:97)$$.-/*$$/3==64%%(:BBB8//134+++&-,.;9+,*--$$$$$/;3,,)*(.+0;=ABB:=8<;1,$$$,-$&%&&$$'44::;<9@6-*,$8%+,%0(().45(9)<1--,>?9>;<.*+36;.+''%%..*&&(244665+$<$$--0')&&($.4?9..;<?>4+/'$$#$$'/18---3%&790*'&))**::*'2+,,&05600*2%%&%%&$$&%$$$(#&*$##%%$#&441..456:96:00012035?46+<9=?-+8+($6&40.-..(''+63313332541-'##%$.>299=<;<::+$$(+,-/.1-=?4/95AE==9<+%(++%$%$###&#$$'###&$*,$$,+%%$'*)%)9:%%$)%%%&4($&1&&1)6754+)$$,$%"#&2*662/'9$#%$*(3;===?ROGJ>A44/.**(.0':89>1;71./5FCIJ;;=01:99HH:BBBC<>97<?A<:3)**13333)((,&#$%))*))510,,')&37682*&    NM:i:264    ms:i:867    AS:i:867    nn:i:0  ts:A:+  tp:A:P  cm:i:58 s1:i:469    s2:i:449    de:f:0.1198 MD:Z:8A7A0G9^CC1G10A0G2G0A5^A1A13^G2T0T13G4G1^TT9^G10G0G2^A25A7G1G4^A8C2G7T0T5A24A9G6C2T37G2^T10A23G0A7^CT7A9^A5A19C5A4^GC5A12T16^CAGAC1C2T19A12A9^GCA14^C5^A7G6G2^G0C1A1C2G20^CTT0G35G9C0T14T3G7G0T10^CAA28^G48G13^AGA9^C1T27T11G1A12^G12^C9^T15^A0G7^TGG5G29^T25T5A8T9^AGG9G31C0A7^G2^A0C13^G43^G69^GC7G10^T12A11^C9^T18G1A2^TAA5T0G3C36C0T0G13^C10A6G1^C17^AAAG24^TC26A4^G1T0T7A14^C22^A18^TC2^G24^TT5T13T5^CC0G7^C4T2A1T1^A21^A7A3T0T16^GA48^A4 rl:i:0
85e8bb44-7520-4ba8-a96a-957db68e5832    0   chrX    5700525 50  60S29M1D24M2D3M3D9M2D11M1D4M1I5M1I8M1I8M1I4M2I6M1D32M1I50M1D3M1I7M1I3M2I9M1D10M1D21M1D17M1I31M1I5M1D5M1I8M1D1M2I4M2I6M2I4M1D4M1D23M1D10M1I15M1D7M1D12M2I17M1D43M1I18M1D16M1D9M1D20M1D1M2I18M1I4M4I13M1D9M1D24M2D41M3I15M1I45M1I72M1D4M1I21M2D4M1I10M2D12M2D3M1D24M2D35M1D72M1I11M2D36M72N25M2D66M2D18M1D3M1I12M1I19M1D12M1I18M1D22M3S   *   0   0   CAGTGTACTTCGTTCAGTTGCGTATTGCTAAGCAGTGGTATCAACGCAGAGGCGTAGTCACCTGTCATCGTCGCTACGCCTTCGGCCTCGAAGAGGATTCGCCTGCCTCTTCCAGTCATTTCGTTATCTGTAAAATGAAGACCTCCTTTTGGAAGAATAAATTTCATTCCACTGCTGCCTGTTACTGTTGGGTTATGCCGTTCAGAAAGAAAAGATCCTTGAACTAGATCTGCTCAGTGGTCCATCCCCCAATGACACGGGCCACTCTGTTGGAGAATGTTCTGAATACTGGATGACTAAAATGAGGCTTCACACGATTTTTGGGAATGGTCGGCCCCATATTCATAGCAGCAGCATTTGTTTTGTAAAGATCTGGTGGAACTAAATAACATTCCCACAAACTTCTACATTGGAAAGGGAGCCAGTGGTGTTTCAAGTAAGGTGCCAGGACTTGGAGACCAGACTGAGCACAAGACTATGCACACTGAGTGCTGCTGAGATTATACACTGGTTGCATGAAGACTGGTGAGGGATGTTAGATCAGGAGTCTGAATAGGTTGGTATACATGCTAGAATCGGTAATCGTGTCCATTCTTAGCAAAAAAGCCTGTTGATGGCTTTTATCCTTTACTTTACTTGATTGGCTTGCCAGAAATGTTTAACTTCTTCTCTTAAGTATTTAAGCCTGCTGTGCATCTGTGTCTGACTGGAAAGTGATGTACCCACAGTTGCTGTTGTGCACCAAACTCACAGCTTATCACTTTAGAACAAGTTTCATTTTTAAACTCCAGCTTTCACTTCCTCTACAGGCGTTGTCTCAGGAATCTCTCTGAGAATCACAATACATCGCTTATGACTTGTCTCCACCTTTTCTCCCTTCTCATCCTTGTTGCCATAGGGAGATCTCAATGCCCAAATTAAATCTGTATTTGTAGTCAATTTTTAATTTCTTCCATGTTGGCAACCGTCCATGTGGGACGAACTGATCACTATCCATTTGAGATATCAAGTAAAGGTCCTTTGATAAATTTTCTCGTGAGAAACAGAAATTCCAATTGTTTTCAGGTGTTCTTTTAAATCTTCTGTAGAAATTGCTGACTCTTCAACATCTGCAGTATTGTGTGGTTTCACAAGATGGAGAATACGACACTTCATATTCCTTACACATATCTTCTGAAAGTTCTGTGGCCCTCAGGGTGAGATCAGAATGTGGCTGCAGTTTTCATGCCAAGAGGGTTCGGTGCAGAAGTGCCCATCCGGATTTCCAGAAGAATTTCTTGCCATACTTTGGCGTC '%'$#'&((.49I;<4=58(209=<<(',070112A:@A4*7$#$%,-,,*,''##%*&###%$,,./.$)7&-,&.//00/(3679/8476-)$8$44+/2398//638;(/*+&0-57+-##$&&,++&,322..*(*&-,913675*0-<4$$),,'$$*-*+/6&+'1(==A;;=A5<?8>@DCA44;<21,'56;C1?>:<67%CB3'&<//426:7200.026@22&(8+1$$*));=4483673&)/--*,'(2..)?@<=5=:1-+)./2247;45'&$%',:<4=:2;(DD<;8818900561407CDG@3//11-6=55112/2=>=;/63::?*++&&*$)0+11;=F8=<;:+-6643.*;))*','-45562)'('%,(/'-'$%#&(&&'%%"+%((*2:225=B55885EAEA=@9:.46.75773@=:&',99=:A:25.&*--1,'3&(#$#$%+.00363$$#&-045-,2$$<B>:?@5669;/((35485+7A76:5550'367://.2$9=8''99$+$$$+-3.?87,-,-%%*5.353697859&$,&?85/%2'&4-642$($#%&//.*,&%%&&653+%'*+-.>>9:''+<AA@C:012./-*')*)%$%%5498,.450+%$&&*+1-,25(((&*,)(+'('%&10+32?>&1..78;:&)/;3537135056:68/3+/06/.,6-::;7;534&,).=B7:@:99);-.-'(69=<:./8:9AA=98>,))+,*.&%%%%$=>89:<<650*)$'''57.77711+48;//4844679,1599+08/942;<826.6*4/0401400*-,/((24000''0+).%$%%%&&&#'&..0::=<D9=<@A=>><$$*$$$($%567---.-*')23,&%$$$'''-*&&+8:>;><833>?C898CA?:/*@H?=;??HH<HH?D@??FD999>6553--5(((&)0;;120.++,-8<9<,1**:;:<6*,'3++56&+2=ABB3A**/*1(9;DD>BHKHI<B6111'(/-+/1+&'57:':,)%%7==?/-%&$&#,00-<99;EDA=72)&%%..;?;?>,65:.111<2;4%453@--+-0--,(%&&35-4:G?8;:<=@:;<5530**//0/,/111022@B4;7:648./&&45?=77827<=;???(%()))*.7,/19/029/%?,/'.7;::4226..;6=HC-74444884&(8;C18)+,**-&&&':(0..2((=;;86CD*=//22/404;B<87/33,((%%'*&)$$&# NM:i:152    ms:i:782    AS:i:741    nn:i:0  ts:A:+  tp:A:P  cm:i:67 s1:i:517    s2:i:467    de:f:0.1017 MD:Z:9A2C2T13^T6A0G4T11^AA3^CCC4C0T3^CC11^G1G19G3G9^C68T13^A22^C0C9^C18G2^T15A37^C13^G11A3^T4^G23^A25^A7^G21A7^G36A2G1G19^A8A7^C1A0T6^A1A0A17^T2A18G2A11^C2T6^T24^AA10C45G22G1G0G2G0T14A0A60T9^A25^AA5A8^GA10A1^TT1A1^G24^TT5T27A1^C0A3T36A41^TT7A0C0A51^CC0T4A20A0T0T37^TA1T16^C34^A1T1A3G4A17^G22 rl:i:0
af97cb34-659a-4d8d-b3d8-33b7262355c1    0   chrX    5700951 1   64S7M1D19M1I15M1D7M1D33M1D21M3D5M1D13M2I10M2I5M1D30M1D10M1I7M2I25M1I4M1D19M1D21M1D9M1D6M1D9M2D6M1D10M1I1M1I3M1D51M2D5M1I10M1D10M2D1M1I8M4D21M1I16M1D7M2D31M1D22M2I19M1D16M1I9M2D6M1D4M1D18M1I5M3I19M1I3M2I6M2D39M2I2M1D13M3D2M2D5M1I3M1D11M1I25M1D2M1D36M1D19M1I15M1I9M1I7M1D13M1D7M1I8M1D23M2D3M1D10M2D10M1D13M135S    *   0   0   CAGGTATGCTTCGTTCAGTTTACGTATTGCTAAGCAGTGAATTTATCAACGCAGAGTTACATAGGTACTGCGAAGATTATACACTGGTTGTGAAGAAGACTGGTGAGGATATTGGTGGGAATCTGAATAGGTTAATTATGCATACTGAATCCATTAATCGATAACCATTAGCAAAAATGTAGTGATGTGGCTTTTGTCGCCCTGCATGATTGGCTTACATACAAATATTTTAACTATTCTCTTACCGTATTTTAAAAACCCTGCTGTGCATCTGTGTCTAGGCTGAAAGTGATGTACCAGTTGTGTTGTGCTTTGACTCACAACTATCACTTTAGACAGTTTCATTTTAATAAACATCCACTTCACATCTCTGACGTTGTCTCAGGAATCTCTCTGAGAATCACAATACATCGCTTATGACGGTCTACACCTTTTCTCCTTCTCATCCTTTGTACCAGGAAGATCTCAATACTTCAAGAAATTAAATCTGTGTTTTAGTCAATTTTTAATTTCTTCCATGTTGGCAACCGTCCTGTTGGGACAGACTGATCACTATTTCCATTTGAGATATCAAGTAAGGTCCTTTGATAAATTTTTCTCGTGAAACAAATTCAATTGTTTTTCAAGACACTTCTTGCACCAAATCTTCTGTAGGGATGTGACTTTCACTTTGCTCCAGAAACATCATAGAGTTGGTAACTCAGGTCTTCCCAGTCCTAAGATCATGTGTTGAGCTCTCAACATCTGCAATTATTCCTTGTGGTTTCACAAGATGAAATACATTACTTCATATTCCTTACACATATCTTCTGAAGTTCTGTATGGCCCTCAGGGGTGAGATCCAGATATTGGCTGCACGTTTCATCTTAGAGGTTTCATTGCAGGAATTGCCATCAGATTTCCAGAAGGAATTTCTTTATCTTTGGCATTGATTTAAACCGTGCCTTTAGATGCTCGAAGAACGCCGTATTTCTGCTTCCGCTCGGCCCGGTCGTTACGCTGCCGCGACTCCAGCTGGCTCCCGGGGCCAGGCGGTCGGTGGATAAGGACACTCGCCTCCACCCGGCGGTCCCATGCTCTGCGTTGAT    (&&%#$&$%&'/6@96('18=467:9HB53<=<,**1%%%#%%%&)*'$##$)4*:23(21++,14)23)$&&%'%&'.--/.')*.((&$336)'&&')0*-/3)(./&&'')0,,.20%78<;>8,,+7<9$%$$)&&//01::&&755:3;;@?=74??A:56/9*)'(.44785+$#&%$#+)%-101345%'+'((')'%'.//98*&$%$$$&'())-.))9;<8.)&(#%)%',)%$"#$%%))((54-(%'((10/0/9879;=:4,,%7$%&,550345321862-,--&-(*$%&+,,,)#%&&'%'%4(('&$/(16(''('$%#$$$(+-&)#)*''("%'%'&%$&$&&)$'*,..%$#%&()&24577;%07/80<-5'8$%%(%.,+&&($#$10*'*%%,-<=*)#&.8)&'%%%%..098/247<3?9**2(.2++*%&*(($()7:;/?:3++./.2+,+'&49;CD?<A<8''89:0)97<,-,BCC?;;:AE>A@<>>,,?<40281431&&(..)*$(0,#%%--*%%09/.))$'315<@@+.24<431<323//(*)%&*(*448=>=DC@>8<572&'())'()%$)-%'%$$389=70/.--&(%$'*&$3.**''9:>??G/1&/-*&%%$%"$$$#&%#$$#%'.&*2$001776870:/533:>>>:314&+(303;<@@A'1)$)4578;842(.$&$)$$%&&%*'(%/'%$$*0%$+&(''8;84868>;BC??A<239?>@==.21--,13609:<054578&)%*(%..-46;8EE<:83)-+4-,)))++*8:=88+00,**(-:=$$:104+++,+-/1($$$*).31/-#%$$'(&/&*,.-%&"$$+/.',10..,%3).-44:&197464$$$/14-.%$$(&"$%'(''&'&%%1;;:/$%'*)'('('#$*%#)**%'*$$&##$$#$$$%$%#$$$&&''$(2**(('%&.)1))(216<0--('$(*(&6/(./.000'4.%&''$$$-;3132:*)'%')&&&%'+43995'&&*%)-121''##%&&-+2++259>4,-36.($    NM:i:159    ms:i:428    AS:i:428    nn:i:0  ts:A:+  tp:A:P  cm:i:18 s1:i:166    s2:i:172de:f:0.1463 MD:Z:7^T2G16C1T12^G7^G3A3G12G1G3A6^A21^TTC5^A9T11A3T2^C12G1C0C0T0G4G7^T1C8A10G7C13A2^G19^C8G0C0A0A7G1^T9^G0G1G3^T9^TA2C3^G1T1T10^A1A1G39T7^TT0A14^C10^AA9^TAGG33A3^G7^TT5T25^A1A1C0A3T0G0A30^A3A21^GA6^G4^C11T0C10T0T13A0A6G0A3^GC2T3T34^G13^TCC0A1^CA1C6^T12G11A11^G2^G36^A11T22G15^G1C0A5G4^G2A5G2A3^C1G21^GC0C2^A10^AG10^A13 rl:i:0
83512cc9-8fd7-4b4e-95a3-9f5741211fed    2064    chrX    5984653 60  291H10M1D13M1I33M3D22M2D3M1I3M1I16M1I3M2D4M1D6M1I9M3D2M2D34M1D17M1D10M2D78M6I26M1I1M1I3M1D10M1D2M2D16M1I4M1D17M1I5M2I24M4I1M1D73M2D8M7I41M1I1M9I17M1I6M6D21M2D2M2I5M2D6M2I3M1D3M2D7M62285N2M2I1M3D7M2D1M1D8M5I2M3D2M1I7M2I1M2I3M1I5M2I2M1I9M1I8M4I7M3I9M1D3M4I1M2I5M1I2M3I3M2I1M3I3M1I12M2I25M5I5M5I13M1D23M949H    *   0   0   GGGGTCCTTCTGTAGGAGGTCCTGGGGTGACTTTGGACGTCCGCTCAGCCAGGTTGCAGCGGCTTAGCGTGTGTCCTAAATCTTCCTCTGCGCGTAGGCAGGCTCAGCCGCAACATGCTCCTGCCAGCTGGCAAGCCAAGAAGCATCCCAGCTTGATTCCTCTTTCGTATTTATTGGAGCAGGGGTACTGTTAGCACTGTATGTGATGCGCTTGGCACTGCTTAATCCAGATGTCAGCTGGGACAGAAAGAACAACCCAGAGCCATGGGAACGCAACAAACTGGGTCCCAATGAACAATATGCAAGTCTATTCTGTAATGGACTACAGCAAACTGGAAGAAAGAAGGCCTAGACTTCCTAAAGTCCATGAAGTTCACTGTAAAGTTGTAACATATAATGAAGGTCTTTCAGAAGCCATCCGCACAATTTTCCACTTTGCCAGGAAATATGTCTCTGAATGCATGAACATGTTGATTTTTTTTTTTTTTGGAGCTTATTACACTGATGAATAAATTTCTGAAATCAAAAAAAAAAAAAAAAAAAAAAAAATGTTAGCGGTGAGAATATTTCTCCTACAGACGAAGAGCGTCGTGTAGAAGCAGTGAAGATCAACGCAGAGCAGCATGGGCAGAAATCGCATCGCGTCAATACCCGCCGCGGCTCGCGATGCTTTGTTTTAGAATTAAACAGTCGGATTCTGGTCCGCACCAGTTCTGTCGGCTGCCATGCCGGCTGAGGCGAGGCGCCGCGCGAAAACCGCGGCCCGGTGGACCCGGCGGGGGAACACGCGGAGGCCTCCCACACGCGCGGGACGCCCGCCCGCCCCCGCC  25;?:=<+$+'&$%'%=2193;+,,-,/:561,,8''328/./1659<8770/2+1'/'&..021673/('&&+5<'59958+,,(4-+++)**./*/(6448#$#"&&#(*;=<>/.$/.##,,40,$+,411%2A>I?HF:>CE?/-4:4/2-5:7310-34679=>DIJ:?G<*7-2B9E<;451()((&%*331356;:.@-5CB;/+/54//2/.)DGCF;><<@;,,40-,0;?>:6*&+)'&*..(??>2.ACHA=+4*1621-&"$((+;>=8.345&'/4?-.684%$#$$%$#$%%%(%$%)+4-0&%&/..98<7A>;JB=425(>?<?BA<A>/66/'+75)++$%06>9,+,++/7:98<7650//.,(''%*%&*+#$%$&+*,.4333:*)3.//5200888=31/237<'('9B86766:&+%$%-$=<<6557=46,,)+)**0432-',.,%%(-1243:898?=?HIBC<,/2$4;:<10-./7<027.+*0,'&)-.262HA**<;::@5A>GAD46466><IBRFL@?E@>97)'''(4(%*($$%)+%).%)(%254'(145;7)%'+((&%$,//.-'(%,*-$&%&$$)''%%((&%#%&,$$$#))343*)$&0$(%'&%*%#%%'&%%89--%%')+'(<+1::?BA<7=.,#'3-//3<=579;5=76//%+(';;;;9<:A>>GH>0//0//'%'$#$##)0-,&#&&%%)4-6688($$%#%7/<=:-**,/.7+)+)&/((3)*,,,-9/36/)+24-(.01.-))&20218=A+A99;665/6/'%%2.,,7:.,--99  NM:i:246    ms:i:259    AS:i:123    nn:i:0  ts:A:+  tp:A:P  cm:i:9  s1:i:94 s2:i:0  de:f:0.2085 SA:Z:chr6,11900377,+,1257S792M6591D21S,60,84;chr11,109011645,-,1638S374M62D58S,31,111;  MD:Z:10^A0G4A8A24T6^AGA10T0C10^TC9T1T5T5T1^TT1T2^A7T7^ATC2^TG17G16^C17^T0G9^GA0G0C8G7T8T2T1C23G17T33^T4C2A0C1^C2^TG20^A3A6C12T18C4^G14C12A15A0A0G16A10^AT8C1A8T8T12C8T0T0G6T0T0T3C3G1G1^GTGGAA7A0G0C0A4T2G0T1^TA5C1^TT3C5^C0T2^CT0T2T6^GGG2T1T2^GG1^G2G0C6^TGG4C2G5G8A2G0G0C1C0C1C0C6G4G2C0T3^G18C2T5C2C0C6A1T1A0C0T3G2C4T5C1G1C1C4^C1C1C0C18   rl:i:48
a783f466-53ca-4f7f-b3d0-a6fbd47cd33e    0   chrX    5984687 10  60S25M2I13M1I5M1D17M10I9M2D21M1D13M5I14M3I6M2D16M1D10M2D10M1D10M5D5M14D13M3I8M1D7M1D46M1I7M3I18M1I66M1D8M2I12M2I4M1I7M2D6M1D33M3D40M86S *   0   0   CATTCGCGCTTCCCGTTCCGGTTACGTATTGTAGCAGTGGTATCAACGCAGAGTGCATGGGGACGTCCGCTCAGCCGGGTTGCAGAAAAGCGGCTTTGGCGTGTGTCTAATCTTCTCTCTGCGATAATAAAAATGGAGGAAGCTGTTACGCAAACGTGCTCCGCAGATCCTCAGGCAGAAGAAGCCAAGAAGCATCGAGAAAGCGATTCCTCTCTTCGTATTATTGGAGCAGGGTACCGGGCAGCACTGTATGTGATCTAGATGTCAGCTTCTAGGACAGAAGAGCAGCCAGAGCCATGGAACAAACTGGGTCCCAATGAACAATATAAGTTCCTGCTCATGTGTGAATGTGGACTACAGCAAAACTGAAGAAAGAAAGGCCCAGACTTCTAAACTATGAAGTTCACTGTAAAGCTACTGATAATGAAGTCTTTCAGGGAAACCATCCACTGAAAATTTTTCCATAAGCAGAAATATGTCTCTAGATACATGAAATCATGTTGATTTTGGAGTTTATTACACTGTTGAATAAATCTCTGAAACGCAAAAAAAAAAAAAAAAAAAGTGTGCCCAACGTGACTCTACATCGTAGATCGGATGAAGCGTCGTCGCGAAACAATACGTAACTA   $$$$#%%$#&&143>CI&;%&**-1-'&''%&)$(')::6328=((%%0''%'$"$$#%%557678<<AA;E2277$%-13*%%&)++&212'#&&'')34''())8*++;<<=;>83)1,+%%&&&&''%$$%#'&%%%$(+)(%&%-&%%(($$$$%&$2%$#$---155&02--,.-*02-.244044664./'&$$$%###&8=)-7=:7(26,))--$%%,-)&&&.()(%$$&'%(&&*,,-++./,),)&$$(+5++66%2''+'&'-?7;<<,-(&%*.$**)&%(9:><><<94497::5'>89>85****//22**(('7'/*-:'.20%+*'&*(-/,+.78=::954,230%#&&')19B646.(*&///.00@347/32:/-5,.0/4255471,&114-,84.8:4511$-&%%)66::84,0/-(+&%&/%%('&(%%'('()))2+/2<2?1010)-4..()9:5;,&&($$*0'+6+:86110**--+42.465=AE@<?>3-,,+-''..6?DCCC?/;%*-20%%&.1.-,+++*))''&&&&%%&:.1/../1*+15<?=>&*3==97%$)/6;=%')##&/%8855*&-$#%&%#+%$%'++--$#%#   NM:i:110    ms:i:209    AS:i:209    nn:i:0  ts:A:+  tp:A:P  cm:i:10 s1:i:88 s2:i:101    de:f:0.1366 MD:Z:16T17A1T6^C6A4C3T3T3T2^GC5C0T6A2T2T1^C8T0G18C0C3^TT16^T10^TG0G6T2^A10^GTGTG5^CTTGGCATTGTTCA3C11G5^G5A1^A1C8T29G7A4C1C27A41G0T10^G5C5G9C9^CT6^G14A2G15^ACT18T2A18   rl:i:17

(There is more in the head of bam file, but I just paste part of it here because it's too long.)

Edit: Ignore parentheses message above. I pasted the whole output from head command because I saw it appears as rows here.

Thank you again!

lindenb commented 3 years ago

hum, your first read works on my machine. Do you have enough space on disk ? what is the size of Sample_A_sam2tsv.txt ?

what is the output of

samtools view -h Sample_A.bam chrX | java -jar dist/sam2tsv.jar -R GRCm38_dna_primary_assembly/Mus_musculus.GRCm38.dna.primary_assembly_corrected_chr_annotation.fa | head

kac2053 commented 3 years ago
#Read-Name  Flag    MAPQ    CHROM   READ-POS0   READ-BASE   READ-QUAL   REF-POS1    REF-BASE    CIGAR-OP
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    0   G   '   3823913 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    1   T   %   3823914 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    2   T   %   3823915 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    3   A   +   3823916 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    4   T   -   3823917 C   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    5   G   -   3823918 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    6   C   *   3823919 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    7   T   *   3823920 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    8   T   -   3823921 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    9   C   .   3823922 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    10  G   9   3823923 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    11  T   A   3823924 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    12  T   <   3823925 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    13  C   9   3823926 C   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    14  G   '   3823927 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    15  G   &   3823928 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    16  T   /   3823929 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    17  A   '   3823930 C   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    18  C   /   3823931 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    19  G   1   3823932 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    20  T   4   3823933 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    21  A   4   3823934 C   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    22  T   :   3823935 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    23  T   :   3823936 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    24  G   ;   3823937 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    25  C   @   3823938 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    26  T   :   3823939 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    27  A   B   3823940 C   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    28  A   9   3823941 T   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    29  G   /   3823942 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    30  C   +   3823943 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    31  A   +   3823944 G   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    32  G   -   3823945 A   S
3cfe3b25-3608-4966-8b92-4dda119628ff    0   1   chrX    33  T   8   3823946 C   S

The output is longer with more read names than what is pasted here.

lindenb commented 3 years ago

ok, I don't think the problems comes from my software, there is no reason it skips any chromosome. May be something wrong happened (any error message), or you have generated a file that is too big for your quotas ? or there was a malformed read in chr19 ?

lindenb commented 3 years ago

another way to test it runs up to the last chromosome, would be:

java -jar dist/sam2tsv.jar -R GRCm38_dna_primary_assembly/Mus_musculus.GRCm38.dna.primary_assembly_corrected_chr_annotation.fa Sample_A.bam | tail

kac2053 commented 3 years ago

There were no error messages in standard error file after running sam2tsv. The input bam file is 2.5 GB. I asked for 20 GB and 1 thread when submitting job. The sam2tsv output is 8.6 GB file. Given this information, do you think this was a memory issue? It doesn't seem like the case but I'm not sure.

How would I check for a malformed read in chr19?

I am running below right now. This might take a while but will paste results when it's done.

java -jar dist/sam2tsv.jar -R GRCm38_dna_primary_assembly/Mus_musculus.GRCm38.dna.primary_assembly_corrected_chr_annotation.fa Sample_A.bam | tail

A side question, the "READ-QUAL" column has symbols that I cannot find the corresponding values online.

The symbols in READ-QUAL in my output: - , ; : ? . ' " ( ) @ * / & # % + < = > $ 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K M N P

The table I found online to interpret the symbols: https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/Informatics/BS/QualityScoreEncoding_swBS.htm

I don't see J, K, M, N, and P symbols in that table online. It is also not in the old illumina table (https://drive5.com/usearch/manual/quality_score.html).

Our data is Nanopore sequencing so is that why there are more symbols than what is in illumina? What are your thoughts?

Thank you.

lindenb commented 3 years ago

How would I check for a malformed read in chr19?

there would be an error message. May be try to run picard ValidateSamFile https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard- on your bam to see any error / warning ( picard use the same API than sam2tsv).

The symbols in READ-QUAL in my output

I see other symbols in a bam from 1000 genomes.

samtools view HG02260.transloc.chr9.14.bam | cut -f 11 | grep . -o | uniq | sort | uniq | tail
E
F
G
H
I
J
K
L
M
N
kac2053 commented 3 years ago

Hi. Here is the output:

$ cat stderr.sam2tsv 
[SEVERE][Sam2Tsv]Cannot find contig "chrM" in dictionary:[chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrMT,chrX,chrY,chrJH584299.1,chrGL456233.1,chrJH584301.1,chrGL456211.1,chrGL456350.1,chrJH584293.1,chrGL456221.1,chrJH584297.1,chrJH584296.1,chrGL456354.1,chrJH584294.1,chrJH584298.1,chrJH584300.1,chrGL456219.1,chrGL456210.1,chrJH584303.1,chrJH584302.1,chrGL456212.1,chrJH584304.1,chrGL456379.1,chrGL456216.1,chrGL456393.1,chrGL456366.1,chrGL456367.1,chrGL456239.1,chrGL456213.1,chrGL456383.1,chrGL456385.1,chrGL456360.1,chrGL456378.1,chrGL456389.1,chrGL456372.1,chrGL456370.1,chrGL456381.1,chrGL456387.1,chrGL456390.1,chrGL456394.1,chrGL456392.1,chrGL456382.1,chrGL456359.1,chrGL456396.1,chrGL456368.1,chrJH584292.1,chrJH584295.1]
com.github.lindenb.jvarkit.lang.JvarkitException$ContigNotFoundInDictionary: Cannot find contig "chrM" in dictionary:[chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrMT,chrX,chrY,chrJH584299.1,chrGL456233.1,chrJH584301.1,chrGL456211.1,chrGL456350.1,chrJH584293.1,chrGL456221.1,chrJH584297.1,chrJH584296.1,chrGL456354.1,chrJH584294.1,chrJH584298.1,chrJH584300.1,chrGL456219.1,chrGL456210.1,chrJH584303.1,chrJH584302.1,chrGL456212.1,chrJH584304.1,chrGL456379.1,chrGL456216.1,chrGL456393.1,chrGL456366.1,chrGL456367.1,chrGL456239.1,chrGL456213.1,chrGL456383.1,chrGL456385.1,chrGL456360.1,chrGL456378.1,chrGL456389.1,chrGL456372.1,chrGL456370.1,chrGL456381.1,chrGL456387.1,chrGL456390.1,chrGL456394.1,chrGL456392.1,chrGL456382.1,chrGL456359.1,chrGL456396.1,chrGL456368.1,chrJH584292.1,chrJH584295.1]
    at com.github.lindenb.jvarkit.util.picard.GenomicSequence.<init>(GenomicSequence.java:154)
    at com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv.printAln(Sam2Tsv.java:253)
    at com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv.scan(Sam2Tsv.java:382)
    at com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv.doWork(Sam2Tsv.java:443)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:763)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:926)
    at com.github.lindenb.jvarkit.tools.sam2tsv.Sam2Tsv.main(Sam2Tsv.java:465)
[INFO][Launcher]sam2tsv Exited with failure (-1)

It seems the reference fasta has "chrMT" while my bam has "chrM". Because it cannot find chrM in reference fasta, it stopped prematurely after chr9, which is right before chrMT in the fasta file. Sorry that this was the issue and I opened an issue thread for sam2tsv when it is not sam2tsv issue. (I received these bam files from someone so I didn't do the alignment.)

Would a possible solution be to replace all "chrMT" in reference fasta file as "chrM"? What do you think?

$ cat Sample_A_sam2tsv_tail.txt
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478033   T   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478034   C   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478035   T   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478036   A   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478037   G   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478038   G   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478039   A   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478040   A   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478041   A   H
e85f63d1-deb4-449c-b5c0-0113b321ccfe    2048    12  chr9    .   .   .   124478042   A   H

For picard ValidateSam, I got this standard error:

$ cat stderr.validate_sam 
INFO    2020-11-18 19:38:53 ValidateSamFile 

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    ValidateSamFile -I Sample_A.bam -MODE SUMMARY
**********

19:38:53.855 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:bin/picard_java1.8/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 18 19:38:53 EST 2020] ValidateSamFile INPUT=Sample_A.bam MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 18 19:38:53 EST 2020] Executing as chuk@lt03 on Linux 3.10.0-957.5.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.8
WARNING 2020-11-18 19:38:53 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
[Wed Nov 18 19:41:18 EST 2020] picard.sam.ValidateSamFile done. Elapsed time: 2.40 minutes.
Runtime.totalMemory()=2075918336
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
$ cat Sample_A_ValidateSam.txt 

## HISTOGRAM    java.lang.String
Error Type  Count
ERROR:MISSING_READ_GROUP    1
WARNING:RECORD_MISSING_READ_GROUP   1969667

I read this warning meant that the header is missing read group information.

$ samtools view -H Sample_A.bam
@HD VN:1.6  SO:coordinate
@SQ SN:chr1 LN:195471971
@SQ SN:chr10    LN:130694993
@SQ SN:chr11    LN:122082543
@SQ SN:chr12    LN:120129022
@SQ SN:chr13    LN:120421639
@SQ SN:chr14    LN:124902244
@SQ SN:chr15    LN:104043685
@SQ SN:chr16    LN:98207768
@SQ SN:chr17    LN:94987271
@SQ SN:chr18    LN:90702639
@SQ SN:chr19    LN:61431566
@SQ SN:chr2 LN:182113224
@SQ SN:chr3 LN:160039680
@SQ SN:chr4 LN:156508116
@SQ SN:chr5 LN:151834684
@SQ SN:chr6 LN:149736546
@SQ SN:chr7 LN:145441459
@SQ SN:chr8 LN:129401213
@SQ SN:chr9 LN:124595110
@SQ SN:chrM LN:16299
@SQ SN:chrX LN:171031299
@SQ SN:chrY LN:91744698
@SQ SN:JH584299.1   LN:953012
@SQ SN:GL456233.1   LN:336933
@SQ SN:JH584301.1   LN:259875
@SQ SN:GL456211.1   LN:241735
@SQ SN:GL456350.1   LN:227966
@SQ SN:JH584293.1   LN:207968
@SQ SN:GL456221.1   LN:206961
@SQ SN:JH584297.1   LN:205776
@SQ SN:JH584296.1   LN:199368
@SQ SN:GL456354.1   LN:195993
@SQ SN:JH584294.1   LN:191905
@SQ SN:JH584298.1   LN:184189
@SQ SN:JH584300.1   LN:182347
@SQ SN:GL456219.1   LN:175968
@SQ SN:GL456210.1   LN:169725
@SQ SN:JH584303.1   LN:158099
@SQ SN:JH584302.1   LN:155838
@SQ SN:GL456212.1   LN:153618
@SQ SN:JH584304.1   LN:114452
@SQ SN:GL456379.1   LN:72385
@SQ SN:GL456216.1   LN:66673
@SQ SN:GL456393.1   LN:55711
@SQ SN:GL456366.1   LN:47073
@SQ SN:GL456367.1   LN:42057
@SQ SN:GL456239.1   LN:40056
@SQ SN:GL456213.1   LN:39340
@SQ SN:GL456383.1   LN:38659
@SQ SN:GL456385.1   LN:35240
@SQ SN:GL456360.1   LN:31704
@SQ SN:GL456378.1   LN:31602
@SQ SN:GL456389.1   LN:28772
@SQ SN:GL456372.1   LN:28664
@SQ SN:GL456370.1   LN:26764
@SQ SN:GL456381.1   LN:25871
@SQ SN:GL456387.1   LN:24685
@SQ SN:GL456390.1   LN:24668
@SQ SN:GL456394.1   LN:24323
@SQ SN:GL456392.1   LN:23629
@SQ SN:GL456382.1   LN:23158
@SQ SN:GL456359.1   LN:22974
@SQ SN:GL456396.1   LN:21240
@SQ SN:GL456368.1   LN:20208
@SQ SN:JH584292.1   LN:14945
@SQ SN:JH584295.1   LN:1976
@PG ID:minimap2 PN:minimap2

Thank you.

lindenb commented 3 years ago

It seems the reference fasta has "chrMT" while my bam has "chrM".

yes , that is the error. It looks like you're not using the very same reference that was used to map your bam (?)

kac2053 commented 3 years ago

I received the bam files from someone. They said they aligned with Mus_musculus.GRCm38.dna.primary_assembly.fa.

Edit: Maybe they gave me the wrong link/version.

Thank you very much!