How to set a threshold for Kmes in hub genes

yamihn commented 4 months ago

Hello, I would to extract top hub genes for a given module. Suppose I want top 100. I noticed that the list I get Is not sorted in decreasing order (sometimes yes, other times Is reverted) and the Kmes score Is different. i mean, genes of a module with Kmes = 0.05 could be not considered specific for that module.. Can I set a threshold to take only the most representative hub genes?

yamihn commented 4 months ago

sorry again, I noticed that when I use gethubgenes() the data frame produced contains the top -say- 25 genes with highest kME. But they are in increasing order, so the top one has smaller kME:

hub_df_5_c1

    gene_name       module       kME
101     Gata6 Cluster_uno5 0.2773182
102       Kdr Cluster_uno5 0.2840699
103    Schip1 Cluster_uno5 0.3035335
104      Bmp5 Cluster_uno5 0.3132763
105     Unc5b Cluster_uno5 0.3140855
106    Igsf11 Cluster_uno5 0.3187531
107     Meis1 Cluster_uno5 0.3240560
108   Cacna1d Cluster_uno5 0.3383718
109    Il17rd Cluster_uno5 0.3386139
110      Dpf3 Cluster_uno5 0.3496444
111     Grid2 Cluster_uno5 0.3608794
112      Nrg1 Cluster_uno5 0.3735618
113    Filip1 Cluster_uno5 0.3843238
114     Myocd Cluster_uno5 0.3956321
115     Nrxn3 Cluster_uno5 0.3978237
116     Tafa2 Cluster_uno5 0.3990121
117    Plxna4 Cluster_uno5 0.4069242
118      Ank3 Cluster_uno5 0.4348151
119   Ccdc141 Cluster_uno5 0.4366839
120   Adamts9 Cluster_uno5 0.4460642
121     Itpr1 Cluster_uno5 0.4597336
122     Rbm20 Cluster_uno5 0.4798275
123     Bmper Cluster_uno5 0.4959626
124     Kcnq5 Cluster_uno5 0.4968344
125     Mef2c Cluster_uno5 0.5821404

Shouldn't they are sorted in decreased order?

smorabit commented 3 months ago

Hi,

To answer your questions:

I would to extract top hub genes for a given module. Suppose I want top 100.

You can run this code:

GetHubGenes(seurat_obj, n_hubs=100)

Can I set a threshold to take only the most representative hub genes?

Yes you can but I do not have advice about what that threshold should be, you will have to decide based on your own data.

sorry again, I noticed that when I use gethubgenes() the data frame produced contains the top -say- 25 genes with highest kME. But they are in increasing order, so the top one has smaller kME:

Thank you for pointing this out, I understand how this could be potentially misleading. I have updated this function to output the results in decreasing order from largest to smallest.

smorabit / hdWGCNA

How to set a threshold for Kmes in hub genes #199