rcastelo / GSVA

Gene set variation analysis
198 stars 40 forks source link

How should ssGSEA be performed on GEP data? #187

Closed csu-lzj closed 1 month ago

csu-lzj commented 1 month ago

I encountered problems when using GSVA to evaluate the pathway activity of samples corresponding to GEP data. I did not know whether the parameter kcdf should be Gaussian or poisson or none when performing gsva() . I tested whether the downloaded GEP data conforms to the Gaussian distribution. Half of the genes did not conform to the Gaussian distribution.

axelklenk commented 1 month ago

Hi,

if you want to use method ssGSEA as implemented in package GSVA, there is no problem, since method ssGSEAdoes not use the kcdf parameter and the outdated version of package GSVA that you are apparently using (unfortunately you didn't tell us which one it is) will silently ignore it, regardless of its value.

Parameter kcdf is only used by method GSVA. It specifies the type of kernel function, method GSVA is using to estimate the cumulative density function (CDF) of your data but not a particular distribution you're data is expected, or required, to follow. As a rule of thumb, and only when using method GSVA, set kcdf to Gaussian when your data is continuous, and set it to Poisson for count data. For more details on method GSVA, please see the original publication: https://doi.org/10.1186/1471-2105-14-7

In order to avoid exactly this kind of confusion, recent versions of package GSVA are employing an object-oriented user interface where parameter objects for method GSVA will accept parameter kcdf and parameter objects for other methods, such as ssGSEA, won't. I'd highly recommend to update to the latest release version, GSVA 1.52.3, part of Bioconductor 3.19.

Please don't hesitate to get back to us if you should have further questions.

Cheers,

csu-lzj commented 1 month ago

Thank you for your quick response! It has been bothering me for a long time.