lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

number of aligned reads inconsistency between sam2tsv and samtools mpileup #179

Closed Huanle closed 3 years ago

Huanle commented 3 years ago

Subject of the issue

Hi @lindenb,

Thanks very much for developing sam2tsv and other bioinformatics java tool-kits.

I have a question regarding the inconsistency in terms of the number of mapped reads I observed in one of my sample.

My environment

Steps to reproduce

What I observed from IGV the number of reads mapped to this position is also 14.

Expected behaviour

I expect no inconsistency in terms of mapped reads between the two.

Actual behaviour

I do not know what is the underlying cause of this. Can you help me figure it out? I attached the relevant files data.tar.gz used in the above commands. Thanks a lot.

lindenb commented 3 years ago
$ java -jar dist/sam2tsv.jar ~/tmp/data/test.filt.bam | awk '($4=="YAL005C" && $8==1)'  | cat -n
     1  09fa5020-9343-47d6-9f75-56b7c8be0ba8    16  60  YAL005C 146 A   @   1   N   M
     2  415554f1-d76b-4f6d-8ade-c01ef795e926    16  60  YAL005C 153 A   >   1   N   M
     3  db9f9880-117e-4239-9bb3-ce22ae59de2e    16  60  YAL005C 183 A   C   1   N   M
     4  aaed3e60-38a1-4e7a-a469-87aa0f735f95    16  60  YAL005C 18  A   /   1   N   M
     5  45955b97-24c0-45c3-85a2-f5d7661d4ba2    16  60  YAL005C 170 A   =   1   N   M
     6  45777d56-1a8e-4a3c-a28c-90dd17ef829c    16  51  YAL005C 188 A   B   1   N   M
     7  f51a1929-b5a4-443e-8623-4bad8f01bbf5    16  60  YAL005C 77  A   <   1   N   M
     8  3d57b5c4-5877-4061-b14f-cdaf64a32c11    16  60  YAL005C 18  A   -   1   N   M
     9  dd617aec-b243-4af4-b47b-cc1edf099568    16  60  YAL005C 191 A   C   1   N   M
    10  b2fc369a-57d7-40ec-8ee5-83e37b113f25    16  60  YAL005C 206 A   .   1   N   M
    11  aae4689b-fe64-4553-b139-59944034a676    16  45  YAL005C 140 A   5   1   N   M
    12  4c4676a5-4c2f-471c-a5d0-0939f25143b2    16  60  YAL005C 171 A   C   1   N   M
    13  c2ed1eb7-6d19-43d6-95a3-a2d4af3751f6    16  59  YAL005C 16  A   $   1   N   M
    14  dd821987-409f-432b-b43c-6f28ceb886f6    16  60  YAL005C 83  A   8   1   N   M
    15  0c366064-d758-41fe-990b-8f67a7b346b2    16  60  YAL005C 131 T   %   1   N   S
    16  39773790-1ffc-4461-8073-f882bf5bf6b3    16  60  YAL005C 172 A   ,   1   N   S
    17  a328222d-831f-44d1-a6e0-8cf7820b37e6    16  60  YAL005C 172 T   %   1   N   S
    18  9e1338a5-6aca-4fc4-9213-ea25ab4c38ff    16  60  YAL005C 20  G   -   1   N   S

I see 18 reads at this position. 4 are soft clipped operator==S. 18-4=14

Huanle commented 3 years ago

Hi @lindenb , Thanks a lot for your quick reply. I assumed the alignment file has only one reference. How careless i was!