williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

results of Generalized Linear Model with Replicates #185

Open adi8897 opened 6 months ago

adi8897 commented 6 months ago

hi i had bam files unsorted i created with STAR. then i used the Quantify IR bam mode (on unsorted files) with reference genome i buit with irfinder then i tried to do the Generalized Linear Model with Replicates in 2 different cases the AdjustedPValue is very high and it make me wonder if I'm doing something wrong this is the file i have IRFinder_GLM_Results.csv i had 5 control and 5 mutant i ran this script .libPaths("/sci/labs/maayan.salton/adi8897/R_libs")

Load the DESeq2 library

library(DESeq2)

Source the IRFinder DESeq2 constructor script

source("/sci/labs/maayan.salton/adi8897/IR_finder_venv/IRFinder-1.3.0/bin/DESeq2Constructor.R")

Read the file paths and experiment design

paths = read.table("filePaths.txt", header = FALSE, stringsAsFactors = FALSE)[,1] experiment = read.table("experiment.txt", header = TRUE, stringsAsFactors = FALSE) experiment$Condition = factor(experiment$Condition, levels = c("control", "mutant")) rownames(experiment) = NULL

Create a DESeqDataSet object from IRFinder results

metaList = DESeqDataSetFromIRFinder(filePaths = paths, designMatrix = experiment, designFormula = ~1)

Extract the DESeq2 object

dds = metaList$DESeq2Object

Set the design formula for the GLM

design(dds) = ~Condition + Condition:IRFinder

Run the DESeq2 analysis

dds = DESeq(dds)

Get results for the control condition

res.Control = results(dds, name = "Conditioncontrol.IRFinderIR") Control.IR_vs_Splice = 2^res.Control$log2FoldChange IRratio.Control = Control.IR_vs_Splice / (1 + Control.IR_vs_Splice)

Get results for the mutant condition

res.Mutant = results(dds, name = "Conditionmutant.IRFinderIR") Mutant.IR_vs_Splice = 2^res.Mutant$log2FoldChange IRratio.Mutant = Mutant.IR_vs_Splice / (1 + Mutant.IR_vs_Splice)

Test the difference of IR ratio between control and mutant

res.Diff = results(dds, contrast = list("Conditionmutant.IRFinderIR", "Conditioncontrol.IRFinderIR")) IR.change = IRratio.Mutant - IRratio.Control

Plot the changes in IR ratio with significance

plot(IR.change, col = ifelse(res.Diff$padj < 0.05 & abs(IR.change) >= 0.1, "red", "black"))

Create a data frame with the results

results_df <- data.frame( Gene = rownames(res.Diff), Log2FoldChange = res.Diff$log2FoldChange, PValue = res.Diff$pvalue, AdjustedPValue = res.Diff$padj, IRChange = IR.change )

Write the results to a CSV file

write.csv(results_df, "IRFinder_GLM_Results.csv", row.names = FALSE)

can you help me understand what am i doing wrong? when i checked the quantification files i sow that modt of the lines is LowCover i think this is the problem but i don't know how to fix it i ran it on different samples from different expiraments.

dg520 commented 5 months ago

@adi8897 Is there a reason you believe there must be differential IR (i.e. low adjusted p-values) in your experiment? This could be a totally expected biology. You said most of the lines are LowCover. How much do you mean by "most"? Another question is what the sequencing depth of your RNASeq libraries are? If a library has < 20 million reads, it is totally normal. If it has more than 100 million reads, that might be a concern. And I would try to avoid running the analyses from different experiments unless you have a way to control for batch effect and other hidden confounding factors.

dg520 commented 5 months ago

@adi8897 Specifically, the very first question to ask: Is the mutation you are studying known to dysregulate IR? If not known, it could be irrelevant to IR.

adi8897 commented 5 months ago

Hi regarding your response. im detecting mutations that known to effect splicing (mutations in the spliceosome component), I'm concern because i checked 4 different mutation, in 4 different runs and always all the padj colunm is equal to one, with not even one row different. when I ran the same analysis on rmats or different programs i found indeed significant events. that makes me think i am doing something wrong.

As my reference is ok this is the quantification stage bin/IRFinder -m BAM -r /sci/labs/maayan.salton/adi8897/IR_finder_venv/IRFinder-1.3.0/REF/Human-GRCh38-release110 -d /sci/labs/maayan.salton/adi8897/master_project/ASD/li_data/IRFinder/SRR1603662_irfinder /sci/labs/maayan.salton/adi8897/master_project/ASD/li_data/IRFinder/sorted_by_name/SRR1603662_Aligned.out.bam.sortedByName.bam

i have 6 control and 6 mutation in this example. this is the output file of one file https://docs.google.com/spreadsheets/d/1SNYLwC29P4bZDwyNqmgXyxF0LisESlPgWExkqnN-wwg/edit#gid=1568919350

this is after i did Generalized Linear Model https://docs.google.com/spreadsheets/d/11PLXkzHBAQCfS3acJNY-8dzTVvF9LEPbIRVVn3WPg50/edit#gid=1397568669

thank you for you're help please let me know if you sww the problem?

dg520 commented 5 months ago

@adi8897

I'd love to help users fix the technical issues in IRFinder and better understand the tool. While debugging users' analytical codes and interpreting the biology behind their experiments is out of my scope, I can try to provide some suggestions.

There are some major differences between rMATS and IRFinder, making there result not directly comparable: 1) rMATS and IRFinder calculates IR in quite different ways. The links provide a brief insight into those differences.Another difference is that the two tools build reference differently. 2) It is possible the IR event in rMATS is not recorded in IRFinder and vice versa. Particularly, IRFinder only considers unannotated IR. 3) IRFinder specifically blocks some blacklist regions, so the reads falling into those regions are not counted. 4) rMATS reports all splicing types of splicing including IR, while IRFinder focuses IR only and doesn't investigate other types of splicing.

There are many review articles comparing these tools' performance, not limited to rMATS and IRFinder, and I won't advertise the conclusion here. But with these major tools' differences in mind, let's talk about your experiment.

Mutations in the spliceosome might just result in non-IR splicing difference but not IR difference. Below are some suggestions: 1) In your rMATS results, are any of the significant results in the category of IR (or "RI" in rMATS term)? 2) If 1) is true, figure out the intron coordinates of those significant "RI" events. Choose any one of your IRFinder output file, can you find the same introns there? Note, the first six columns in IRFinder output are intron coordinates in BED format (i.e. 0-based half open).
3) If 2) is true, you might want to load BAM files (both WT and mutant) into a sequencing coverage visualization tool such as IGV, and check those intron regions plus their flanking exons, one by one. Is there a dramatic difference visually? Note, "Set Data Range" of each track proportionally to its library size, so that the visual comparison can be done in a fair way. For example, when comparing two libraries one (A) with 120M reads and the other (B) with 100M reads, the ratio between A and B is 1.2. Therefore, in IGV, I would "Set Data Range" 1.2 times higher for A than B. And both "Max" for A and B in "Set Data Range" should be larger than the maximal coverage of the tracks you're visualizing.
4) If you believe most of IR you visualize should be different between WT and mutant after performing 3), you can stick with rMATS results. Otherwise, IRFinder does its work.

Some specific situations:
If 1) is false, that means rMATS agrees with IRFinder, where the mutation is likely irrelevant to IR. If 1) is true but 2) is false, that means rMATS finds annotated IR to be different, which is out of the scope of the default IRFinder reference.