vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Hi, I have a question about vg augment. #4388

Closed pioneer-pi closed 4 days ago

pioneer-pi commented 1 week ago

I use the GAM file to augment a xx.vg to get a new xx_aug.vg and new aug.gam file. What's the difference between original GAM and augment GAM. I use this function and check these two file. Found that the align condition has been changed. So I am confused about vg augment

jeizenga commented 1 week ago

vg augment will add new variants that are observed in the reads into the augmented graph. Doing this changes the node IDs in the graph, so it's also necessary to change the alignments to make sure that they are still consistent with each other. In addition, some mismatches and indels from the original alignments will now be added as variants in the graph, so the updated alignments will now report them as matches.

pioneer-pi commented 1 week ago

However, I found that I extract sequence from mapping path(It means the content of GAM file), It doesn't math with the sequence. For example:

{"annotation": {"proper_pair": true}, 
"fragment": [{"length": "-157", "name": "20"}], 
"fragment_length_distribution": "708:160.955:54.7822:0:1", 
"fragment_prev": {"name": "ST-E00144:1084:HCF3NCCX2:2:2115:18040:10873"}, 
"fragment_score": 52.517613696995731, 
"identity": 0.96666666666666667, 
"mapping_quality": 60, 
"name": "ST-E00144:1084:HCF3NCCX2:2:2115:18040:10873", 
"path": 
    {"mapping": [
        {"edit": [{"from_length": 11, "to_length": 11}], "position": {"node_id": "14397", "offset": "21"}, "rank": "1"}, 
        {"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14398"}, "rank": "2"}, 
        {"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14399"}, "rank": "3"}, 
        {"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14400"}, "rank": "4"}, 
        {"edit": [{"from_length": 1}, {"from_length": 15, "to_length": 15}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 6, "to_length": 6}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 1, "sequence": "G", "to_length": 1}, {"from_length": 7, "to_length": 7}], "position": {"node_id": "14401"}, "rank": "5"}, 
        {"edit": [{"from_length": 4, "to_length": 4}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 3, "to_length": 3}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 3, "to_length": 3}], "position": {"node_id": "14402"}, "rank": "6"}
        ]},   
"quality": "ICAlJSUpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpJSkpKSkpKSkpKSUlKSkpKSkpKSklJSUgICUpKSkpKSkpKSkpKSkpKSkpKSkpKRYMDAwMDAwWGwwMFgwMCAwMGwwMFgwMDAwMDAwW", 
"refpos": [{"name": "20", "offset": "406166"}], 
"score": 129, 
"sequence": "AAAGATTATTA CAAATCTCAATAGCACATATACTGTTTATACC TCTTAGTTCTAGTTTCTCAGTTTGTAATACTC CTTCAAGGAATGTTTTGCATGGTGTATTCTTT TTTTTTTTTTTTTTG T GACGAA T G CTCACTC TGTT T CCT T AGC", 
"time_used": 479.0}

I get sequence from "path tag" and It doesn't match with "sequence", There are some mismatch in sequence, like node's base doesn't match sequence base.(Node base: ATGC, Sequence base: TTGC)

jeizenga commented 1 week ago

Can you give the command that you used to determine the the node sequences?

pioneer-pi commented 1 week ago

Can you give the command that you used to determine the the node sequences?

@jeizenga I get the sequence from gfa file.I convert vg to gfa and check the sequence information.

jeizenga commented 1 week ago

The actual commands would still be helpful.