walaj / bxtools

Tools for analyzing 10X Genomics data
MIT License
42 stars 10 forks source link

Summarizing stats based on another tags (e.g. MI or PS) #25

Closed gnarzisi closed 6 years ago

gnarzisi commented 6 years ago

Thank you Jeremiah for working on this tool. I am experiencing some strange behaviors and I wanted to give you a heads up.

I would like to collect stats based on different tags (e.g., MI, PS, etc.) but, independently of what I specify in the --tag option, I always obtain the same output (stats based on BX).

Here is the simple command I used (just for chr22):

samtools view -h $BAM chr22 | bxtools stats - -t MI > stats.MI.tsv

The reads do contain the other tags. I used the BAM file provided by the GIAB team. Below is an illustrative example of a read containing all the tags.

Another strange behavior I noticed is that the values reported in output for the AS column are all 0s. This seems odd since the majority of the reads have AS values different from 0.

ST-E00273:177:HMTTCCCXX:1:2120:6806:23477       105     chr22   10510039        60      101M27S chr21   8532882 -347    ATGTTTGGAATATAAAATCAGCAACTAATATGTATTTTCAAAGCATTATCAATACAGAGTGCTAAGTGACTTCACTGGGAAAGGTAGTCATATAAAGAACAGACTAATAGTCCGGGATTATTGTGAGG        <<F,7AFKF,F,,F,FKFAFK7AAFKFFKKFF,,<F7,7,,,<AK,,<,,7,A,,F,,77AF,7FFK7,,,AKA<,,,7,,7,,,AFF,F,F<FAKFKA,,,,7,,,,,,,7,,(A<AK,,<7,,<,,        DM:Z:1.236364   QT:Z:A<,F<FFA   BC:Z:TCACATCA   QX:Z:,AAF,<FFFFKFKKA<   AM:A:1  XM:A:0  TR:Z:TAGTCGC    TQ:Z:FKA,FKK    AS:f:-93  RG:Z:27058:MissingLibrary:1:HMTTCCCXX:1 XS:f:-94        BX:Z:TGAATCGCAACTGGAG-1 XT:i:0  RX:Z:TGAATCGCAACTGGAG   OM:i:60 PS:i:10464994   HP:i:2  PC:i:26 MI:i:28314638
walaj commented 6 years ago

Hi Giuseppe, Thanks for reporting, this does look like a bug but hopefully one that won’t take too much to fix. I’ll post here again when I figure out what it going on.

On Apr 25, 2018, at 11:47 AM, Giuseppe Narzisi notifications@github.com wrote:

Thank you Jeremiah for working on this tool. I am experiencing some strange behaviors and I wanted to give you a heads up.

I would like to collect stats based on different tags (e.g., MI, PS, etc.) but, independently of what I specify in the --tag option, I always obtain the same output (stats based on BX).

Here is the simple command I used (just for chr22):

samtools view -h $BAM chr22 | bxtools stats - -t MI > stats.MI.tsv

The reads do contain the other tags. I used the BAM file provided by the GIAB team. Below is an illustrative example of a read containing all the tags.

Another strange behavior I noticed is that the values reported in output for the AS column are all 0s. This seems odd since the majority of the reads have AS values different from 0.

ST-E00273:177:HMTTCCCXX:1:2120:6806:23477 105 chr22 10510039 60 101M27S chr21 8532882 -347 ATGTTTGGAATATAAAATCAGCAACTAATATGTATTTTCAAAGCATTATCAATACAGAGTGCTAAGTGACTTCACTGGGAAAGGTAGTCATATAAAGAACAGACTAATAGTCCGGGATTATTGTGAGG <<F,7AFKF,F,,F,FKFAFK7AAFKFFKKFF,,<F7,7,,,<AK,,<,,7,A,,F,,77AF,7FFK7,,,AKA<,,,7,,7,,,AFF,F,F<FAKFKA,,,,7,,,,,,,7,,(A<AK,,<7,,<,, DM:Z:1.236364 QT:Z:A<,F<FFA BC:Z:TCACATCA QX:Z:,AAF,<FFFFKFKKA< AM:A:1 XM:A:0 TR:Z:TAGTCGC TQ:Z:FKA,FKK AS:f:-93 RG:Z:27058:MissingLibrary:1:HMTTCCCXX:1 XS:f:-94 BX:Z:TGAATCGCAACTGGAG-1 XT:i:0 RX:Z:TGAATCGCAACTGGAG OM:i:60 PS:i:10464994 HP:i:2 PC:i:26 MI:i:28314638 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

walaj commented 6 years ago

The tag option wasn't being correctly read in by bxstats and there was an issue with the float tags as well. A test run on the BAM you link to appears to be working correctly.
e.g. 28358269 66 398 19 -5

Let me know if other issues, but a recursive update of the git repos should fix these problems.

gnarzisi commented 6 years ago

Everything seems to be working fine now. Thanks you for the quick fix!

I have another functionality suggestion: similar to the "mol" subcommand, it would be very useful to have a "phase-set" subcommand that generates a BED file with the minimum footprint for each phase set. In this case, multiple barcodes will be associated to each phase set and they could be reported as a comma-separated list.

walaj commented 6 years ago

I just updated the repos to give mol the same tag-choice option (-t) as other commands. Would bxtools mol -t PS get you mostly what you need, aside from an extra BED field tracking which BX codes belong to which phase-sets?

gnarzisi commented 6 years ago

That would be good enough for now.

I tried the new code with the -t PS option, but it does not seem to be ready. The output still seems to same as for the MI tag.

walaj commented 6 years ago

I just updated with this functionality. It was already tracking the BX tags, so I needed to print them. The -t option should now work in mol (see below example BED output):

image

gnarzisi commented 6 years ago

Great! Thank you Jeremiah. Closing the ticket ;)