tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
783 stars 210 forks source link

Incorrect data file for notebook : Run a Gene Ontology Enrichment Analysis (GOEA) #226

Closed vwongjun closed 3 years ago

vwongjun commented 3 years ago

Hi,

I'm trying to do notebook Run a Gene Ontology Enrichment Analysis (GOEA) in order to get familiar with the library. At step 4, data are read from file nbt.3102-S4_GeneIDs.xlsx which is the Nature paper supplemental table 4. I have downloaded the table but when running the code, I have the following error:

symbol, geneid, pval = [pg.cell_value(r, c) for c in range(pg.ncols)] ValueError: not enough values to unpack (expected 3, got 2)

In the table I have downloaded from the paper, there's only 2 columns when the script is expecting 3. Where can I find the table version with 3 columns?

Thanks for your help.

dvklopfenstein commented 3 years ago

@vwongjun,

Thank you for your interest in GOA TOOLs and for taking your time to contact us.

  1. What version of goatools are you using?
  2. What is the name of the excel spreadsheet in the Jupter notebook?

I am seeing this code showing the spreadsheet name in the Jupyter notebook, goea_nbt3102, in the cell under 4. Read study genes, is nbt.3102-S4_GeneIDs.xlsx. Does this match the Excel spreadsheet filename that you are seeing?

# Get xlsx filename where data is stored
ROOT = os.path.dirname(os.getcwd()) # go up 1 level from current working directory
din_xlsx = os.path.join(ROOT, "goatools/test_data/nbt_3102/nbt.3102-S4_GeneIDs.xlsx")
vwongjun commented 3 years ago

Thank you for taking the time to answer my question @dvklopfenstein. I guess I wasn't looking straight yesterday. I must have missed the file which I was able to find this morning in said folder (goatools/test_data/nbt_3102/). Sorry for the useless question. Thanks again.