sdsucomptox / danrerlib

Danio Rerio Library, danRerLib: transcriptomics analysis for zebrafish researchers
https://sdsucomptox.github.io/danrerlib/index.html
Other
0 stars 0 forks source link

Functional Enrichment Analysis Tutorial gets Error #2

Closed cscho-csrc closed 3 months ago

cscho-csrc commented 4 months ago

I was successful in installing danrerlib using pip and was able to import it into my jupyternotebook. I was following the Functional Enrichment Analysis tutorial, and the function enrich_fisher and enrich_logistic get KeyError.

I manually created a pandas dataframe that matches the printed data in the tutorial (just the first five lines) using

data = {
    'NCBI Gene ID': [100000006, 100000009, 100000026, 100000030, 100000044],
    'PValue': [0.792615, 0.607285, 0.021338, 0.007880, 0.015286],
    'logFC': [0.115009, -0.144714, 0.603871, -2.083141, 0.803879]
}

gene_universe_tpp = pd.DataFrame(data)

gene_universe_tpp

Then I copy-pasted the code in the tutorial

# the gene id type you have in your gene universe. 
# In this case, we have NCBI Gene IDs. 
ncbi_id = 'NCBI Gene ID'

# the database you wish to test enrichment for 
database_choice = 'KEGG Pathway'
result = enrich_logistic(gene_universe = gene_universe_tpp, 
                database = database_choice, 
                gene_id_type = ncbi_id,
                )

and I get a KeyError:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[30], line 1
----> 1 result = enrich_logistic(gene_universe = gene_universe_tpp, 
      2                 database = database_choice, 
      3                 gene_id_type = ncbi_id,
      4                 )

File ~/miniconda3/envs/danrer_env/lib/python3.11/site-packages/danrerlib/enrichment.py:186, in enrich_logistic(gene_universe, database, gene_id_type, org, directional_test, sig_gene_cutoff_pvalue, log2FC_cutoff_value, concept_ids, background_gene_set, sig_conceptID_cutoff_pvalue, order_by_p_value, min_num_genes_in_concept, include_all, orthology_base)
    183     direction = 'non-directional'
    185 if org != 'variable':
--> 186     out = _enrich(gene_universe, database, gene_id_type, org, 'logistic', direction, 
    187                 sig_gene_cutoff_pvalue, log2FC_cutoff_value, concept_ids, 
    188                 background_gene_set, sig_conceptID_cutoff_pvalue, order_by_p_value, 
    189                 min_num_genes_in_concept, include_all)
    190 else:
    191     out = _enrich_variable_org(gene_universe, database, gene_id_type, org, 
    192                                         'logistic', direction, 
    193                 sig_gene_cutoff_pvalue, log2FC_cutoff_value, concept_ids, 
    194                 background_gene_set, sig_conceptID_cutoff_pvalue, order_by_p_value, 
    195                 min_num_genes_in_concept, include_all, orthology_base)

File ~/miniconda3/envs/danrer_env/lib/python3.11/site-packages/danrerlib/enrichment.py:465, in _enrich(gene_universe, database, gene_id_type, org, method, direction, sig_gene_cutoff_pvalue, log2FC_cutoff_value, concept_ids, background_gene_set, sig_conceptID_cutoff_pvalue, order_by_p_value, min_num_genes_in_concept, include_all)
    463 result = pd.DataFrame(resulting_dictionary_list)
    464 if sig_conceptID_cutoff_pvalue and not include_all:
--> 465     result = result[result["P-value"] <= sig_conceptID_cutoff_pvalue]
    466 if order_by_p_value:
    467     result = result.sort_values(by='P-value', ascending=True)

File ~/miniconda3/envs/danrer_env/lib/python3.11/site-packages/pandas/core/frame.py:3896, in DataFrame.__getitem__(self, key)
   3894 if self.columns.nlevels > 1:
   3895     return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
   3897 if is_integer(indexer):
   3898     indexer = [indexer]

File ~/miniconda3/envs/danrer_env/lib/python3.11/site-packages/pandas/core/indexes/range.py:418, in RangeIndex.get_loc(self, key)
    416         raise KeyError(key) from err
    417 if isinstance(key, Hashable):
--> 418     raise KeyError(key)
    419 self._check_indexing_error(key)
    420 raise KeyError(key)

KeyError: 'P-value'

I was wondering if the column name had to be "P-value" instead of "PValue" which was in the example, and I tried re-naming the column title for the pd but got the same error. I got the same error for enrich_fisher as for enrich_logistic.

sdsucomptox commented 3 months ago

The issue comes from the code attempting to sort a result that doesn't exist. I have included a safe guard that makes sure the result is length < 0 and if not, gives an explanation: "No significant concepts found"