vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 191 forks source link

Align transcriptome data to an already-constructed pangenome graph #4322

Open CarlosAmadeo7 opened 2 days ago

CarlosAmadeo7 commented 2 days ago

Hello vg team It is amazing the work you are doing. I am relatively new to using vg and I would appreciate a little help with my question please. I want to download an already constructed human pangenome graph, made in Minigraph-Cactus, from the HPRC Pangenome Resources, and then align RNA seq data to it. Is there any way I can do that ? or is there any workflow I can follow?. Is the already constructed pangenome enough to do that ? I know those are very basic questions and I would appreciate your help pointing me out in the right direction. I hope for your prompt response. Have a great day !

adamnovak commented 1 day ago

I think what you are meant to do is to take a pre-built pangenome graph, and also a set of splicing annotations, and combine them with vg rna into a spliced pangenome graph. Then you map your RNA reads against the spliced graph, instead of the original pangenome graph.

If you don't add the splicing edges and just map straight to the plain unspliced pangenome graph, you won't be able to have your alignments follow known splices.

CarlosAmadeo7 commented 1 day ago

Thank you for that information. Is there any workflow available I can use? like where to get the splicing annotations and how to combine the original pangenome with them. Best

adamnovak commented 1 day ago

It looks like according to https://www.nature.com/articles/s41592-022-01731-9#code-availability we don't actually have a vg rna WDL workflow to go with the paper. There's a repo at https://github.com/jonassibbesen/vgrna-project-paper which can point you to where the transcripts that were used in the paper can be found, and it has the code you need to replicate the paper's work, but it's not packaged up nicely for re-running out of the box on your own data. And there's https://github.com/vgteam/vg/wiki/Transcriptomic-analyses and https://github.com/vgteam/vg/wiki/Multipath-alignments-and-vg-mpmap#additional-considerations-for-using-vg-mpmap-for-to-map-rna-seq-reads-to-splice-variation-graphs which are meant to explain what you would need to know to analyze your own data, but which need you to fill in a bunch of blanks.

@cmarkello once wrote a WDL workflow at https://github.com/NCBI-Hackathons/TheHumanPangenome/tree/5055b4c01af69483709883b2f82fdf208e75d0ec/RNA/wdl_pipeline apparently for a hackathon, but I am not sure if it was ever finished.

CarlosAmadeo7 commented 4 hours ago

Thank you so much for all of this information. I will see what I can get from all of these sources Best