vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

I have a question about gam file. #4233

Closed pioneer-pi closed 4 months ago

pioneer-pi commented 4 months ago

The length of sequence and the length of quality is different. The following is a read mapping result. The length of sequence is 100, but the length of quality is 136. Theoretically, shouldn't these two values be equal?

{"identity": 1.0, 
"mapping_quality": 60, 
"name": "simulated.1", 
"path": 
    {"mapping": .......................
    "name": "simulated.1"}, 
"quality": "JycnJygoJyYnKCYoKCgoKCcmJygkKCglJSgnKCgoKCQhKCgoKCglJxwoKCIoIiMeKCgcKCgoJyckJSUiKCgkIiUoJiEaJiUoJyggKCgoKCMeKCcoJygoJCQkKCAoJyYbIRwoKA==", 
"refpos": [{"name": "x", "offset": "403"}], 
"score": 110, 
"sequence": "GTTATTTACTATGAATCCTCACCTTCCTTGACTTCTTGAAACATTTGGCTATTGACCTCTTTCCTCCTTGAGGCTCTTCTGGCTTTTCATTGTCAACACA", 
"time_used": 888.0}
glennhickey commented 4 months ago

The quality string is a base64 encoding of the quality values. Its length will only be the same after decoding

import base64
len(base64.b64decode('JycnJygoJyYnKCYoKCgoKCcmJygkKCglJSgnKCgoKCQhKCgoKCglJxwoKCIoIiMeKCgcKCgoJyckJSUiKCgkIiUoJiEaJiUoJyggKCgoKCMeKCcoJygoJCQkKCAoJyYbIRwoKA=='))
100
pioneer-pi commented 4 months ago

The quality string is a base64 encoding of the quality values. Its length will only be the same after decoding

import base64
len(base64.b64decode('JycnJygoJyYnKCYoKCgoKCcmJygkKCglJSgnKCgoKCQhKCgoKCglJxwoKCIoIiMeKCgcKCgoJyckJSUiKCgkIiUoJiEaJiUoJyggKCgoKCMeKCcoJygoJCQkKCAoJyYbIRwoKA=='))
100