Open yeeus opened 1 month ago
I don't think we have a known good way to get annotations against all the different samples in the graph using vg. Your idea of injecting into the path you have annotations on and then surjecting that sequence to each other path you are interested in, as an alignment, might work OK.
If you actually have assemblies you want annotated, I think we'd probably recommend using the Comparative Annotation Toolkit instead of vg. CAT is designed to annotate new assemblies using alignments and annotations on previous assemblies, and it actually thinks about things like paralogs and ortholog matching and pseudogenization. I'm not sure how well it works on e.g. MHC, but I also wouldn't lean on vg inject
and vg surject
and the HPRC graphs to get "reliable" annotations for the assemblies.
Maybe @ph09 or @glennhickey can speak to how well CAT's ortholog matchings are likely to agree with the HPRC graph's Minigraph-Cactus alignments?
PLEASE DO NOT MAKE SUPPORT REQUESTS HERE
Please the Biostars forum instead:
https://www.biostars.org/new/post/?tag_val=vg
Ok I will post on Biostars later.
Hello dear friends! Thanks for developing vg such a useful and magic tool for pangenome graph. Firstly I need to say I'm fresh to manipulating graphs due to the various formats (e.g. .vg, .xg, .gbz, .gam ...). And now, as a junior, I do need some helps: I have a human pangenome graph with several genomes with a reference
genome_a
. And I want to see the locations of some interested genes regions in my graph like the Fig. 5d in HPRC publication. Due to the high complexity of these regions like MHC, gene annotations are not reliable for which we can just draw the gene locations from annotations. Therefore, I turned to using graph to get locally detailed and confident gene annotations. At first, I have tried this method (actually this method is following the odgi tutorial):odgi untangle
the injected graph to see the locations of genes on each pathHowever, I found that for genes having CNV, this method seems often inable to capture all gene copies (actually usually just one copy), so I have turned to finding anther useful method. As for now, I intended to:
For step 2, I initially used
vg annotate
but it seems only work for reference path (#4158). And I usedvg surject
using command:which have not got results as I write this. Also from #4158, in which the developers suggested:
and I think I can also use this, well stupid method, to get the gene locations from the gaf file Graphaligner generated.
Emmm, I don't know whether
vg surject
I used above can generate correct alignment file containing the gene locations on each path or not. So I want to know anybody can give me some advice for my process and method or any other helpful method. Please!Best wishes! Thanks!