samtools / htslib

C library for high-throughput sequencing data formats
Other
783 stars 447 forks source link

Inconsistent truncation of floating-point tag values during conversion with sam_parse1 function from htslib #1784

Closed zhaobu closed 1 month ago

zhaobu commented 1 month ago

Issue Summary: Inconsistency in formatting a floating-point tag value when using sam_parse1 function from htslib.

Description: I encountered an inconsistency while using the sam_parse1 function from htslib to convert a kstring_t obtained from minimap2 into a bam1_t structure. The original tag value "de:f:0.0530" was modified to "de:f:0.053" after the conversion.

Steps to Reproduce:

Run minimap2 to obtain alignment results. Extract the alignment information into a kstring_t structure. Use sam_parse1 function from htslib to convert the kstring_t into a bam1_t structure. Retrieve the tag value associated with the "de:f" tag from the resulting bam1_t structure. Expected Behavior: The tag value "de:f:0.0530" should be preserved without any modification during the conversion process.

Actual Behavior: The tag value "de:f:0.0530" is modified to "de:f:0.053" after the conversion.

minimap2 result:

A00744:48:HV33LDSXX:3:1101:14335:1016   99  chr3    15507804    60  151M    =   15507900    247 NCAGTACTCCAACAGTGGAACAAGTGAAGCAGTGTAGCTCTTACCTGCAGGTGGGGGGCATTGGGGCCCCGGACGGCCAGGTTGACCAGAAGGCCCAGGCTTGCCTGGTGGGCCAATAACTCCACTATCCCCTATCTGTAACTGACAGAGA #,F,FF,::FFF:F:FFFF,,FFFF,F,F,,FFFFFFFFF,,F,:FFFFFFFFFFF,FF,,,FFF,FFF,F,F,,,FFF,,F,:F,FF:F:F,:FFFF,F:FFF:::,FF,F:F,F,FFFF::F::F,F,F,:,,FFFF,,FFFFFFFF:, RG:Z:NA12878.1  NM:i:8  ms:i:230    AS:i:230    nn:i:1  tp:A:P  cm:i:12 s1:i:207    s2:i:0  de:f:0.0530 rl:i:0

after use sam_parse1 write to bam1_t, result:

A00744:48:HV33LDSXX:3:1101:14335:1016   99  chr3    15507804    60  151M    =   15507900    247 NCAGTACTCCAACAGTGGAACAAGTGAAGCAGTGTAGCTCTTACCTGCAGGTGGGGGGCATTGGGGCCCCGGACGGCCAGGTTGACCAGAAGGCCCAGGCTTGCCTGGTGGGCCAATAACTCCACTATCCCCTATCTGTAACTGACAGAGA #,F,FF,::FFF:F:FFFF,,FFFF,F,F,,FFFFFFFFF,,F,:FFFFFFFFFFF,FF,,,FFF,FFF,F,F,,,FFF,,F,:F,FF:F:F,:FFFF,F:FFF:::,FF,F:F,F,FFFF::F::F,F,F,:,,FFFF,,FFFFFFFF:, RG:Z:NA12878.1  NM:i:8  ms:i:230    AS:i:230    nn:i:1  tp:A:P  cm:i:12 s1:i:207    s2:i:0  de:f:0.053  rl:i:0
whitwham commented 1 month ago

We store a floating point number not a fixed point one. The value is stored not the formatting of that value. The final zero is meaningless for that stored value.