Closed Robin-Rounthwaite closed 1 year ago
I think the problem here is that vg annotate
is annotating all the _alt_
paths in the graph. You have these "alt" path positions in your truth that are also in the original graph and it's able to find a match with those paths. Your normal graph have no alt paths and therefore cannot make the same match.
You can verify this by dropping the alt paths from your original graph vg paths -v graph.vg -q _alt -d > graph.clean.vg
and you will get the same result as your normalized graph.
I'm not sure the context of your comparison, but you may want to make sure you're only looking at annotations of real paths?
It may be a good idea to make vg annotate
ignore alt paths by default, but maybe there's a use case for them....
After discussion in meeting, looks like the consensus is:
If I want to use vg annotate
, I need to make sure that both graphs have the same alt paths.
Good to know!
1. What were you trying to do? Normalize Snarls is built to take each snarl, extract their haplotypes, and realign them using sPOA to generate a new and hopefully improved representation of variation.
Running this process has revealed multiple cases where two apparently identical mappings of the same read (one in normalized, one in unnormalized) are marked as different levels of accuracy. I'll use read seed_12345_fragment_110250 as an example
2. What did you want to happen? To map read seed_12345_fragment_110250_1 in the normalized and unnormalized graphs, and get the same response with regards to accuracy. (i.e. both True or both False for the mapping accuracy).
3. What actually happened? seed_12345_fragment_110250_1 is mapped as accurate in the unnormalized graph, but not the normalized graph. But both have the same reference mapping position, the same mapping score, and the same mapping quality of 60. This suggests that these two mappings are effectively identical. (seed_12345_fragment_110250_2 remained unaffected - normalized and unnormalized mappings are marked as accurate.)
Unnormalized mapping output (gam of a single read pair): s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/seed_12345_fragment_110250.unnormalized.gam
Normalized mapping output (gam of a single read pair): s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/seed_12345_fragment_110250.normalized.gam
truth-set gam (gam of a single read pair): s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/seed_12345_fragment_110250.accuracy_drops_for_identical_mappings.true-pos.gam
unnormalized graph: s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/graph.vg
normalized graph: s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/graph.spoa.normalized.vg
You can also find the svgs here: s3://vg-k8s/users/rrounthw/normalize_snarls/report-giraffe-bug/seed_12345_fragment_110250-svgs/
Example commands to run mapping (where BASE is the name of the files without the extension):
6. What does running
vg version
say?