guigolab / sQTLseekeR

R package to detect splicing QTLs (sQTLs)
http://big.crg.cat/computational_biology_of_rna_processing/sqtlseeker
4 stars 3 forks source link

Taking into account covariates when searching for sQTLs #3

Open jmonlong opened 6 years ago

jmonlong commented 6 years ago

@vsvinti asked in the old repo:

Hi there I am wondering if taking into account covariates is possible to do with sQTLseeker? You mention that the raw counts shouldn't be transformed in any way. Many datasets, however, have underlying structure cause by batch effects, etc, which we may want to correct for so that they don't influence the results. If this functionality is not available, how to do suggest that one takes this into account? It is possible to generate a residuals matrix with PEER (that can take into account covariates and other hidden structure). Would that be something suitable to use as input into sQTLseeker? Can you please also comment on what impact on the computations would other ways of transforming the data have, such as between-sample normalisation, and transcript length correction (not necessary for eQTLs)? I thought that raw counts shouldn't be compared directly between samples ..

I answered what I thought:

I'm not involved in these developments but I think a new version of sQTLseekeR will soon be released that supports the inclusion of covariates in the model. The reason why we recommend using raw counts as inputs is because they are converted into transcript usage ratios. Although I believe these would be less affected by batch effects, it's always possible. If the transcript expression is normalized, the transcript ratios wouldn't represent relative usage anymore. It might still be possible to detect differential usage but we might need to use a different distance computation. Including covariates directly in the model would be the ideal solution I think. To control for ethnicity or admixture that would be very useful though.

I'm sure @dgarrimar will be able to tell you more.

vsvinti commented 6 years ago

Thanks @jmonlong Do you know if differences in transcript lengths is an issue in the computation of sQTLs? I know it's not a problem for eQTLs, as we are not comparing between different genes.. What is the timeline for this new version of sQTLseeker?

dgarrimar commented 6 years ago

Dear @vsvinti, covariates can be included in the new version of sQTLseekeR. Also, a nextflow implementation and other updates will be available. This will be probably released by the end of the year. @jmonlong can confirm, but I'd rather use TPMs instead of raw counts, even if we eventually deal with proportions, given that these will change depending whether taking into account the transcript lengths or not, if I remember correctly.

vsvinti commented 6 years ago

Thanks @dgarrimar I thought TPMs are not recommended to be used for between-sample comparisons .. Or does this not happen during the internal calculations?