shengqh / Hurley2022scRNA

Code for prostate cancer paper
MIT License
7 stars 3 forks source link

GSEA database and running the runGSEA command #2

Open erthrall opened 1 year ago

erthrall commented 1 year ago

Hi there, I am learning to carry out scRNA-seq data analysis with your dataset and workflow as is described in the paper. My question relates to the 20220209_edgeR_inCluster_byCell_GSEA.r file.

As per the download instructions on the GSEA website, I have downloaded the zipped folder called GSEA v4.3.2 for the command line (all platforms), containing the gsea-cli.bat file, and have assigned it to the gseaJar variable.

However, I am a little confused as to how to use GSEA from R, especially the variables called gseaDB and gseaCategories. Should I download the database (MSigDB), and copy and paste its path to gseaDB? In addition, I noticed the items in gseaCategories are not paths, is this just to select individual gene set collections within the database (as in here)?

Thank you so much for reading!

shengqh commented 1 year ago

Yes. You need to download MSigDB and assign the path to gseaDB.

https://github.com/shengqh/Hurley2022scRNA/blob/main/seurat_sct_celltype_rename_edgeR_betweenCluster_byCell_GSEA/result/20220209_edgeR_betweenCluster_byCell_GSEA.r

erthrall commented 1 year ago

Thank you!

erthrall commented 1 year ago

Hi again, I ended up uploading the 31 RNK files for the 31 cell clusters to the GSEA GUI, the output was 31 directories with very helpful information pertaining to gene up/downregulation and their associated pathways. However, I am not sure how to produce the diagram such as in Fig. 3f and 3l from these directories. Using Fig. 3f (shown below) as an example, I assume it would be uploading data from the directories associated with clusters 6, 5, 11 (tumour clusters), and 12 (benign cluster) to GSEA to produce this figure. I'm not entirely sure which files to upload and how to do that. Any advice would be much appreciated!

image

hongyuenwong commented 1 year ago

Hi Erthrall,

Thank you for your inquiry. Congratulations for attaining the 31 directories for your dataset with GSEA GUI! For each directory (of your preferred curated gene sets) for each of your comparison, there is an “index” file with links to enrichment results for na_pos (upregulated gene sets) and na_neg (downregulated gene sets). Clicking on “GS DETAILS” will reveal specific enriched genes for the respective gene set in your data.

In our case, for Fig. 3f and 3l, we compared “tumor” vs “benign” for each specified cluster, on the Hallmark gene sets. Simply uploading data from the directories will not produce Fig. 3f and 3l because data in the directories is not filtered. Following consensus cutoffs, we only included significant gene sets that have “NOM p-val” below 0.05 and “FDR q-val” below 0.25 for both up/downregulated gene sets. Then, we matched up the filtered gene sets for clusters 6, 5, and 11 for Fig. 3f, and clusters 10, 12, 2, and 21 for Fig. 3l. Eventually, we consolidated the “NES” of enriched gene sets into a tab delimited txt file used as the input to plot the figures. I simply used excel for filtering and matching but feel free to use alternative ways.

Our R code for plotting heatmaps with the input txt file for Fig. 3l as an example are in this folder: https://github.com/shengqh/Hurley2022scRNA/tree/main/220322_Heatmaps

I hope the above helps? Hong Yuen :^)