tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
745 stars 212 forks source link

Error: Unicode Decode Error #295

Closed yb520abab closed 1 month ago

yb520abab commented 1 month ago

Hi,

Thank you for providing such a great tool! I encountered a "Unicode Decode Error: 'ascii' codec can't decode byte 0xe2 in position 1103: ordinal not in range(128)". I converted all my gene symbols into gene IDs, and I converted them into UTF-8 format. The format of gene IDs look right to me. The error asked me to trace back to the file "find_enrichment.py" but I don't have the access to edit the script or other scripts. Did anyone else also encounter the similar problem? Many thanks!

tanghaibao commented 1 month ago

@yb520abab

Would you mind pasting the traceback message here? The error says that you have non-standard character 'â' in the file. It may be that find_enrichment.py isn't reading the file with utf-8. The other solution is to identify the character 'â' and replace it with something else.

yb520abab commented 1 month ago

@tanghaibao Hi, I checked my files and there isn't any non-standard character 'â' in the files. Here's the traceback message: Traceback (most recent call last): File "scripts/find_enrichment.py", line 44, in main() File "scripts/find_enrichment.py", line 31, in main obj = GoeaCliFnc(GoeaCliArgs().args) File "/u/home/y/yebi24/.conda/envs/be298/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 232, in init _study, _pop = self.rd_files(*self.args.filenames[:2]) File "/u/home/y/yebi24/.conda/envs/be298/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 416, in rd_files study, pop = self._read_geneset(study_fn, pop_fn) File "/u/home/y/yebi24/.conda/envs/be298/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 424, in _readgeneset study = frozenset(.strip() for _ in open(studyfn) if .strip()) File "/u/home/y/yebi24/.conda/envs/be298/lib/python3.6/site-packages/goatools/cli/findenrichment.py", line 424, in study = frozenset(.strip() for _ in open(studyfn) if .strip()) File "/u/home/y/yebi24/.conda/envs/be298/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] Many thanks!