scverse / scanpy-tutorials

Scanpy Tutorials.
https://scanpy-tutorials.readthedocs.io/
189 stars 117 forks source link

Extract Differentialy Expressed Genes for each group #51

Closed ViriatoII closed 2 years ago

ViriatoII commented 2 years ago

Quick question:

I can plot differentially expressed genes for a group of celltypes in a dataset, like this:

sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon", min_fold_change=3)
sc.tl.rank_genes_groups(adata, 'celltype', method='wilcoxon', key_added = "wilcoxon",     min_fold_change=3)

How can I access those genes for each celltype?

The data seems to be stored in adata.uns['wilcoxon'], but I have no idea how to extract that


{'params': {'groupby': 'preds',
  'reference': 'rest',
  'method': 'wilcoxon',
  'use_raw': False,
  'layer': None,
  'corr_method': 'benjamini-hochberg'},
 'names': rec.array([('CD14', 'HAL', 'KRT19', 'VWF', 'GLUL', 'CD4', 'CYP2E1', 'MARCO', 'CD3E'),
            ('FLT4', 'COL1A1', 'SOX9', 'LHX6', 'CLEC10A', 'VWF', 'CD14', 'CSF1R', 'NKG7'),
            ('C5AR1', 'ADAMTSL2', 'SPP1', 'HTRA3', 'CD276', 'COL1A1', 'GLUL', 'VSIG4', 'IL7R'),
            ('MYH11', 'COLEC11', 'COL1A1', 'RSPO3', 'SIRPA', 'COLEC11', 'C5AR1', 'CD68', 'PTPRC'),
            ('HAL', 'CD14', 'IGFBP3', 'HAL', 'C5AR1', 'C5AR1', 'CSF1R', 'HAL', 'HAL'),
            ('SIRPA', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'CYP2E1', 'COL1A1', 'CYP2E1', 'CYP2E1'),
(.......................)  ],     
           dtype=[('Unassigned', 'O'), ('b-cells', 'O'), ('cholangiocytes', 'O'), ('endothelial cells', 'O'), ('erythroid cells', 'O'), ('hepatic stellate cells', 'O'), ('hepatocytes', 'O'), ('kupffer cells', 'O'), ('t-cells', 'O')]),
 'scores': rec.array([( 1.1047919 ,  1.4223580e+01,  8.5403233e+00,  2.9836638e+00,  3.7443349e+00,  1.03904781e+01,  5.28960686e+01,  2.30597286e+01,  1.96362038e+01),
            ( 0.65801376,  8.8617716e+00,  5.2942681e+00,  1.9939163e+00,  2.1673870e+00,  1.02436619e+01,  4.85130644e+00,  1.95555763e+01,  9.89364815e+00),
            ( 0.6156045 ,  4.8951001e+00,  4.3029914e+00,  9.9042362e-01,  1.9944884e+00,  8.08185673e+00,  4.40766430e+00,  1.81397591e+01,  5.23387718e+00),
(.......................)  ],     
           dtype=[('Unassigned', '<f4'), ('b-cells', '<f4'), ('cholangiocytes', '<f4'), ('endothelial cells', '<f4'), ('erythroid cells', '<f4'), ('hepatic stellate cells', '<f4'), ('hepatocytes', '<f4'), ('kupffer cells', '<f4'), ('t-cells', '<f4')]),
 'pvals': rec.array([(0.26924979, 6.54181873e-46, 1.33846555e-17, 0.00284819, 1.80872154e-04, 2.73975710e-25, 0.00000000e+00, 1.17486974e-117, 7.58615165e-86),
            (0.51052929, 7.87526234e-19, 1.19494060e-07, 0.04616121, 3.02053529e-02, 1.26361078e-24, 1.22650871e-06, 3.69808420e-085, 4.43561476e-23),
            (0.53815556, 9.82557304e-07, 1.68507525e-05, 0.3219671 , 4.60987120e-02, 6.37880428e-16, 1.04491384e-05, 1.54705656e-073, 1.65990714e-07),
            (0.56701976, 3.78240049e-06, 2.15205603e-05, 0.32967762, 1.87330858e-01, 1.54989449e-11, 6.36140894e-05, 1.67436768e-069, 3.60759031e-06),
(.......................)  ],     
ivirshup commented 2 years ago

You can use sc.get.rank_genes_groups_df to get a nice dataframe out of an AnnData which has had rank_genes_groups run on it. Something like sc.get.rank_genes_groups(adata, group="B cells").

In general, the scverse discourse is a better place to ask questions, since more people will be able to see it.

ViriatoII commented 2 years ago

Thanks!