williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Extract IRratio from DESeq2Object #40

Closed mebbert closed 6 years ago

mebbert commented 6 years ago

Hi,

I'd like to calculate the area under the curve (AUC) for significant hits, so I'd like to extract the IRratio from the DESeq2Object. I haven't been able to find that, though. I tried using counts(dds) on the dds object resulting from:

metaList = DESeqDataSetFromIRFinder(filePaths=paths, designMatrix=experiment, designFormula=~1)
dds = metaList$DESeq2Object

From that, I get the intronDepth and totalSplice for every sample and junction, but calculating intronDepth/totalSplice does not equal the IRratio from the sample's IRFinder-IR-dir.txt file.

Is there an easy way to extract the IRratio from the DESEq2Object?

Really appreciate your help.

dg520 commented 6 years ago

Hi @mebbert,

To recall IRratio from metaList, try intronDepth/(totalSplice+intronDepth). totalSplice now represents # of reads for correct splicing only since v1.2.4, instead of representing the whole denominator.

Best, Dadi

mebbert commented 6 years ago

Thanks @dg520,

The numbers for IntronDepth and totalSplice don't seem to match those I see in the sample's IRFinder-IR-dir.txt file.

Sample file

chr1 2514089 2514353 PANK4/ENSG00000157881/clean 0 - 10 1 4.30392 3 4 5 5 7 5 3.65 82 82 82 0.0498694 NonUniformIntronCover

dds object

PANK4/ENSG00000157881/clean/chr1:2514089-2514353 3 66

The IRratio from the dds object would be 3/(66 + 3) = .043478261, rather than the 4.30392/(82+4.30392) = 0.0498694 in the file.

Am I missing something?

EDIT: To be clear, column names for the the 3 and 66 in the dds object are intronDepth.<sampleName> and totalSplice.<sampleName> respectively (columns 1 and 60) where there are 59 samples.

dg520 commented 6 years ago

Hi @mebbert,

It seems that the dds object column you've selected to calculate IR ratio doesn't match the sample file you've used as the reference. I cannot reproduce this error in my end. Was the sample input order shifted somehow, due to the bug you reported in the previous threads? Please check that first.

In addition, the IRratio calculated by R is a bit different from the original IRFinder output file. totalSplice in the R version equals the number of exact splicing (refer to column 19th of the original output file) while in the original output it takes the maximal of column 17th and 18th. In a short word, in R version: IRrato = column 9th / (column 9th + column 19th) while in the original output: IRrato = column 9th / (column 9th + max(column 17th, column 18th))

If you calculated them in the uniform way, they should match each other for each sample.

I should have made them uniform. Will do this in the next release together with other stuff.

Best, Dadi

mebbert commented 6 years ago

@dg520,

You were correct. I thought I had already sorted the input file list, but I hadn't.

I also noticed two minor discrepancies: (1) it looks like the dds intronDepth rounds column9 to the nearest whole number (see below); and (2) it looks like the dds object already incorporates column9 + column19 into the totalSplice value.

Sample file

chr1 44649769 44649882 RNF220/ENSG00000187147/clean 0 + 10 0.854369 1.93023 1 2 3 5 4 2.1 1.5 110 109 109 0.017245 NonUniformIntronCover

DDS object

RNF220/ENSG00000187147/clean/chr1:44649769-44649882: 2 111

The 1.93023 from the sample file is rounded up to 2 in the dds object, and the 109 from the sample's exact splicing column is 111 in the dds object.

So, I'm just dividing intronDepth by totalSplice for each sample and junction. Am I missing anything?

dg520 commented 6 years ago

Hi Mark,

1) You're right that intron depth is rounded to integer in order to fit GLM with negative binomial distribution.
2) Are you using IRFinder before v1.2.4? If so, you're right that dds object has already reported the sum of exact splice and intron depth. This behavior has been changed since v1.2.4 (refer to change log), where it only reports the exact splice instead of the sum. This is because I found that using the sum during the model fitting stage tended to change the scale of each fit, which made the dispersion estimation a bit off.

Please note, the function DESeqDataSetFromIRFinder returns a list of three objects. The first one is the dds object. The second one is the original values for intron depth (before rounding) while the last one corresponds to column 19th.

Best, Dadi

mebbert commented 6 years ago

Ah, yes, I'm using 1.2.3. I used the install commands found on the wiki, which are specific to 1.2.3. I'll upgrade to 1.2.4 so I don't confuse myself in the future.