Using vg to analyze transcriptomic data from a mixed background

Jokendo-collab commented 9 months ago

We have a two sets [A x B] and [C x D] of F0 animals which were short read sequenced. The F0 ([A x B] and [C x D] )were repeatedly mated to give F1 [AB] and [CD]. The F1 were then mated to give F2 [ABCD]. We now want to identify haplotype specific transcripts for A, B, C, and D in F2 using F2 RNAseq data. How can I go about this? Any suggestion(s).

jeizenga commented 9 months ago

To use the tools in the VG toolkit, you would need either genome assemblies or variant calls for the inbred lines. Do you have that available?

Jokendo-collab commented 9 months ago

I have the illumina fastq files for the F0. I can use that to do variant calling. How is it possible to use this variant files to do a simultaneous haplotype specific transcript quantification?

jeizenga commented 9 months ago

If you generate a phased VCF file, you can use vg rna to create a diploid transcriptome and a spliced variation graph. After that, you can use vg mpmap to map RNA-seq reads to the spliced variation graph. There's an external tool called rpvg that can then estimate haplotype-resolved expression. The process is described further in this publication.

It's not completely clear to me whether you are thinking about using RNA-seq or genomic DNA to call the variants. As a heads up, using RNA-seq for this process will have lower yield. The problem is that it is difficult to identify the variants when the expression of the associated transcripts is low, which means you have reduced recall on exactly the variants you are interested in (i.e. the ones with strongly haplotype-biased expression).

Jokendo-collab commented 9 months ago

I want to use RNA-seq data from the F2 [ABCD] generation. From my reading, the vg rna needs the transcriptomic and not genomic data. I will give your suggestion a try and see how it goes.

vgteam / vg

Using vg to analyze transcriptomic data from a mixed background #4216