vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

How to quantify the uncore gene expression by using vg rna #4306

Closed HanYu-me closed 2 weeks ago

HanYu-me commented 3 weeks ago

Thanks for your development of this useful tool!

I have a question about pantranscriptome. I have several assembled and annotated genomes. In pangenome analysis, the annotated gene could be clustered into core and uncore genes and some uncore gene specificly presented in un-backbone genome. As I understand it, vg rna can only quantify the expression of different spliced gene based the backbone genome annotation. If a uncore gene not present in this annotation file, how to use vg rna function to quantify the expression level of it?

Thanks, Yu Han

jeizenga commented 2 weeks ago

Hi Yu Han,

If you obtain annotations to other strains/haplotypes in your pangenome, you can add them into the graph alongside the reference transcripts. If your pangenome expresses the strains/haplotypes as reference sequences (using the RS tag or P lines in the GFA), then you can provide the annotations just like the reference annotations. If your pangenome expresses the strains/haplotypes as haplotypes (W lines without an RS tag), then you should use --use-hap-ref in vg rna or --hap-tx-gff in vg autoindex. Once the spliced pangenome graph is constructed, all downstream analyses are the same as if projecting reference transcripts onto haplotypes.