kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
96 stars 18 forks source link

Question about sva correction: why does IsoformSwitchAnalyzeR tell me I have a lot of SVs but sva actually disagrees? #238

Open bozbezbozzel opened 2 months ago

bozbezbozzel commented 2 months ago

Hi,

I'm following the steps in your vignette (great work by the way) for my data of about 100 NSCLC patients. With importRdata I keep receiving the warning though that sva finds too many sources of variation.

My data is fairly heterogeneous as you'd expect from patient samples from disease such as lung cancer, but I have a few things going for me namely that

Now out of interest I ran sva separately on my transcripts and gene-levels counts, imported with tximport and scaled in the case of transcripts. Using num.sv with be as the method (which is the default as far as I'm aware) gives me 3 SVs for the counts, 1 for the transcripts. That doesn't seem like it would warrant the warnings I keep getting.

I resorted to disabling sva with detectUnwantedEffects = FALSE but now I'm curious where this discrepancy could come from. I did also notice that the guesstimated dtu number is zero-- seems possible but not likely. I'm wondering if I'm doing something wrong and am just not noticing?

chunxubioinfor commented 1 month ago

Hi! I just checked the source code and yes ISAR first does log transformation and filtration on expression, then applies num.sv to estimate the number of SVs. So I guess the difference between results from ISAR and your own analysis might derive from the data process before the sva. Also the estimated DTU is zero is very weird. Could you share the conditions or comparisons of your analysis?

bozbezbozzel commented 1 month ago

Hi Chunxu, thanks for your reply. I'll try to replicate the data preprocessing to see if it makes a difference. My comparison is simple, it's samples that are infiltrated with CD8 cells versus samples that are CD8-excluded, about a 50/50 split in this dataset. Biologically it's not necessarily expected that there's a strong dtu difference, I thought it would be interesting to check. But to really have nothing significant at all seemed a bit odd to me.

bozbezbozzel commented 1 month ago

Just to add that I reran num.sv on logged and filtered abundances and counts, and the number of SVs did increase (22 for the counts, 25 for abundances-- doesn't seem crazy to me for patient data). Running the downstream code manually (checking whether they are not too highly correlated/diagonal) allows me to add SVs so I still don't understand where the errors are coming from. Happy to make a more formal bug report with everything I did.