hartleys / JunctionSeq

The JunctionSeq R package is a powerful tool for testing and visualizing differential usage of exons and splice junctions in next-generation, high-throughput RNA-Seq experiments.
28 stars 16 forks source link

Alterative junctionseq pipeline inquiry #27

Closed dejonggr closed 6 years ago

dejonggr commented 6 years ago

I'm trying to run junctionseq to look at the alternative splicing between two subgenomes in a tetraploid but I've run into a number of issues.

First, the number of annotated exons (or, at least the number of exons output by QoRTs) between the two subgenomes (let's call them A and C) differs, causing obvious problems with runJunctionSeqAnalyses().

After filtering the dataset for mutual exons, I received the error: "vst cannot be performed without parametric dispersion fit. Using un-transformed data instead." Which is repeated (what seems like) infinitely

Do you think there's a way I can get this to work or is my experiment fundamentally incompatible with JunctionSeq? I understand this question is a bit vague, so I understand if there isn't a straight answer (if any).

Thanks!

dejonggr commented 6 years ago

I suppose the crux of the issue is the fact that the gene names are different between samples (which I assume is a problem given the required annotation file). Do you think there's a way around this without venturing into statistically unsavoury territory?

hartleys commented 6 years ago

Eek. That's a really tricky one.

Practically all differential splicing analyses are very sensitive to mapping or alignment biases. If your different samples are actually aligned to different genomes I shudder to imagine how severe such issues would be.

I assume you are interested in the difference between the sample groups belonging to different genomes? Or are you looking at some other biological condition where the genome subtype is a confounded?

The second one might (maaaaybe) be possible. The first one? Probably not. Way too many assumptions are being broken.

I don't think there's any tool that will give trustworthy expression estimation and hypothesis testing when comparing cross genome build.

Are you intending to look for differences genome wide? Or are there specific genes of interest you want to study? If the latter, something less analytic and more descriptive might be a viable option.

On Nov 17, 2017 4:36 PM, "Grant de Jong" notifications@github.com wrote:

I suppose the crux of the issue is the fact that the gene names are different between samples (which I assume is a problem given the required annotation file). Do you think there's a way around this without venturing into statistically unsavoury territory?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hartleys/JunctionSeq/issues/27#issuecomment-345372038, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwu7Gxhqm55y5FEBkuqRgrjgJUeIrBKks5s3fxKgaJpZM4Qf0vH .

dejonggr commented 6 years ago

I think I should explain my situation in a bit more detail. All samples are mapped to the same genome (both A and C subgenomes are included in the same annotation). I'd essentially like to separate the samples by both treatment and by a list of homeologs (e.g. pairs originating from each of the subgenomes). So, overall, I'd like to assess the differences in AS between the different homeologous gene pairs under each treatment. However, the problem is that the gene names differ (e.g. A200021:J001 vs C200021:J001). While the genes are similar, they are not quite identical which could also lead to some issues - so I suppose they should be considered different genomes (although they are mapped to the same genome).

Do you think this could be solved by formatting or is using JunctionSeq/DEXSeq just not possible?

A descriptive approach would be fine, but it would ideally be preceded by a whole-transcriptome analysis.

hartleys commented 6 years ago

That's a really tough one. If the genes aren't identical in their exon spans and splicing, then the exon/junction labels won't match up either. You'll have to find a way to match them up and label them properly.

Are you interested in the splicing of specific candidate genes, or in the transcriptome as a whole? If it's candidate genes then maybe you could jerry-rig something semi-manually. But transcriptome wide conversion is going to be much, much harder (possibly impossible).

dejonggr commented 6 years ago

I'm interested in looking at differential splicing transcriptome-wide. I was planning to filter my dataset by genes that share high sequence similarity (~99%). I also compiled a list of paralogous junctions. Perhaps that will help?

How does QoRTs go about labelling exons/junctions (i.e. is it based solely on the annotation file, or are the E00N/J00N identifiers arbitrarily designated)? Assuming they're consistent with my list of paralogous junctions, I could try to organize the JunctionSeq input data accordingly.

hartleys commented 6 years ago

The exon and junction id's are assigned in order by position. The exon regions are all numbered first, then the known junctions, then the novel junctions.

You should have a flattened GFF file that contains all the genes, exonic regions, and splice junctions along with their ID's. You should be able to use that to link up paralogous genes and junctions. It's gonna require a bit of coding to link them up.