Closed pioneer-pi closed 4 days ago
vg augment
will add new variants that are observed in the reads into the augmented graph. Doing this changes the node IDs in the graph, so it's also necessary to change the alignments to make sure that they are still consistent with each other. In addition, some mismatches and indels from the original alignments will now be added as variants in the graph, so the updated alignments will now report them as matches.
However, I found that I extract sequence from mapping path(It means the content of GAM file), It doesn't math with the sequence. For example:
{"annotation": {"proper_pair": true},
"fragment": [{"length": "-157", "name": "20"}],
"fragment_length_distribution": "708:160.955:54.7822:0:1",
"fragment_prev": {"name": "ST-E00144:1084:HCF3NCCX2:2:2115:18040:10873"},
"fragment_score": 52.517613696995731,
"identity": 0.96666666666666667,
"mapping_quality": 60,
"name": "ST-E00144:1084:HCF3NCCX2:2:2115:18040:10873",
"path":
{"mapping": [
{"edit": [{"from_length": 11, "to_length": 11}], "position": {"node_id": "14397", "offset": "21"}, "rank": "1"},
{"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14398"}, "rank": "2"},
{"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14399"}, "rank": "3"},
{"edit": [{"from_length": 32, "to_length": 32}], "position": {"node_id": "14400"}, "rank": "4"},
{"edit": [{"from_length": 1}, {"from_length": 15, "to_length": 15}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 6, "to_length": 6}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 1, "sequence": "G", "to_length": 1}, {"from_length": 7, "to_length": 7}], "position": {"node_id": "14401"}, "rank": "5"},
{"edit": [{"from_length": 4, "to_length": 4}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 3, "to_length": 3}, {"from_length": 1, "sequence": "T", "to_length": 1}, {"from_length": 3, "to_length": 3}], "position": {"node_id": "14402"}, "rank": "6"}
]},
"quality": "ICAlJSUpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpJSkpKSkpKSkpKSUlKSkpKSkpKSklJSUgICUpKSkpKSkpKSkpKSkpKSkpKSkpKRYMDAwMDAwWGwwMFgwMCAwMGwwMFgwMDAwMDAwW",
"refpos": [{"name": "20", "offset": "406166"}],
"score": 129,
"sequence": "AAAGATTATTA CAAATCTCAATAGCACATATACTGTTTATACC TCTTAGTTCTAGTTTCTCAGTTTGTAATACTC CTTCAAGGAATGTTTTGCATGGTGTATTCTTT TTTTTTTTTTTTTTG T GACGAA T G CTCACTC TGTT T CCT T AGC",
"time_used": 479.0}
I get sequence from "path tag" and It doesn't match with "sequence", There are some mismatch in sequence, like node's base doesn't match sequence base.(Node base: ATGC, Sequence base: TTGC)
Can you give the command that you used to determine the the node sequences?
Can you give the command that you used to determine the the node sequences?
@jeizenga I get the sequence from gfa file.I convert vg to gfa and check the sequence information.
The actual commands would still be helpful.
I use the GAM file to augment a xx.vg to get a new xx_aug.vg and new aug.gam file. What's the difference between original GAM and augment GAM. I use this function and check these two file. Found that the align condition has been changed. So I am confused about vg augment