hbctraining / DGE_workshop_salmon_online

https://hbctraining.github.io/DGE_workshop_salmon_online/
163 stars 75 forks source link

gseKEGG no longer recommends nperm, and other issues #44

Closed hwick closed 2 months ago

hwick commented 9 months ago
gseaKEGG <- gseKEGG(geneList = foldchanges, # ordered named vector of fold changes (Entrez IDs are the associated names)
                    organism = "hsa", # supported organisms listed below
                    nPerm = 1000, # default number permutations
                    minGSSize = 20, # minimum gene set size (# genes in set) - change to test more sets or recover sets with fewer # genes
                    pvalueCutoff = 0.05, # padj cutoff value
                    verbose = FALSE)
no term enriched under specific pvalueCutoff...
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize,  :
  We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize,  :
  You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.

This also complains that there are no terms enriched under my pvalueCutoff, but there are definitely terms with < 0.05

clusterProfiler_4.8.2

mistrm82 commented 8 months ago

Check version of clusterprofiler -- is nPerm renamed as an argument ? Add a note if necessary

hwick commented 2 months ago

update: Not only is nperm deprecated, but it also appears that there are kind of two versions of gse: fgsea and fgseaMultilevel:

> gseaKEGG <- gseKEGG(geneList = foldchanges, # ordered named vector of fold changes (Entrez IDs are the associated names)
+                     organism = "hsa", # supported organisms listed below
+                     nPerm = 1000, # default number permutations
+                     minGSSize = 20, # minimum gene set size (# genes in set) - change to test more sets or recover sets with fewer # genes
+                     pvalueCutoff = 0.05, # padj cutoff value
+                     verbose = FALSE)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize,  :
  We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize,  :
  You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.

When I look up the difference between these two functions, it appears that fgseaMultilevel is meant to be used with pre-ranked genes. Given that we are ranking our genes, and nperm not being recommended, it makes sense to simply remove nPerm here. I have implemented this change. Now the only warning we get is that some p values will be below 1e-10

fgseaMultilevel sort of has something that can be used like nperm called sampleSize, which is described in this thread, in which one of the authors of fgsea states that there is rarely any need to change the default parameter.

For now, I have removed the nPerm argument and also the comment about nPerm.