vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

confused about the number of genome and gff files being provided to vg autoindex while mapping RNA data #4393

Open QianghuiZhu opened 1 week ago

QianghuiZhu commented 1 week ago

Hi, vg is a great software in pangenome.

I have 12 genome and gff files, and already built a graph-based pangenome with SV vcf file by vg construct and vg index.

I also have some RNA-seq data, and want to align RNA-seq data to graph pangenome. In my opinion, it seems that I should re-build a graph pangenome using vg autoindex -w mpmap -v sv.vcf.gz rather than using above index. But for options -r and --tx-gff which may repeat in vg autoindex, should I use just one genome as ref or all of 12 genomes?

I hope for your response. Thanks!

jeizenga commented 1 week ago

vg autoindex is designed to take common interchange formats like FASTA and VCF and produce internal vg formats like the ones you get from vg index. So, yes, you would not use your already-constructed indexes if you want to use vg autoindex.

Most users starting from a VCF+FASTA will only have GFFs for the reference sequence, so I'm not sure what your 12 GFFs look like. VCF doesn't always neatly preserve contig coordinates, so I think it would be very difficult to get sensible results using haplotype-specific GFFs. Certainly, the pipeline is better tested and hardened using one GFF. The reason we allow multiple GFF inputs is more to accommodate users who have GFFs that are split up by chromosome.

QianghuiZhu commented 1 week ago

Thanks for this. We assemblied 12 genomes and annotated them, so we have multiple FASTA and GFF files. I will only use one genome and its related gff file as input for vg autoindex. Best!

jeizenga commented 6 days ago

If you build a graph using the raw assemblies (e.g. using Minigraph-Cactus), you could also supply a GFA file containing the haplotypes and then also provide the individual haplotype annotations to vg autoindex using --hap-tx-gff.