Closed tjiagoM closed 4 years ago
How many gene groups are you querying? You got this problem because this line of code:
res = pd.read_csv(StringIO(response.content.decode('utf-8')),sep="\t")
I don't know what happens. But I suggest the reason is network latency. gseapy wait for a long time to get back results from Enricher server. I'll take a time to look at this
Yeah, for some groups I have a few hundreds, but I ended up not saving any group because it constantly changes. I will try to run again and see for which groups it stops this time.
Now that you talk about it, sometimes gseapy was failing because of a connection reset exception, and I solved this by just adding a few milliseconds of sleep before calling enrichr()
each time. Could it be that that response read by StringIO
has some error/warning from the API request, and that's why pandas cannot read it properly?
@zqfang I was going to create a new issue, but I'm now receiving another error in an inconsistent way (a bit like the error in this issue). Do you think it might be related to this? Apologies for just throwing the exceptions here, but they just randomly appear, so maybe you might know better how to help me.
Traceback (most recent call last):
File "07_explain_communitites.py", line 84, in <module>
cutoff=0.05)
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 391, in enrichr
enr.run()
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 309, in run
gss = self.parse_genesets()
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 68, in parse_genesets
enrichr_library = self.get_libraries()
File "home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 183, in get_libraries
libs = [lib['libraryName'] for lib in libs_json['statistics']]
KeyError: 'statistics'
I think the problems you’ve had are for the same reason: the Enrichr server could not handle gseapy’s high concurrent requests from same IP address in a short time. It seems that user will be blocked to prevent API abuse. So, when you try to get the data back, you will get nothing. I have no other way to improve this, except adding sleep after each querying. Do you have any ideas?
I see, thanks for the help anyway!
I'd say if you have a timeout from the Enrichr server, or some error in the returning answer from Enrichr, maybe just catch that and show to the user that the problem is with the Enrichr server (and maybe suggest wait a bit or reduce the number of requests). Otherwise all these errors will surely just bring confusion when the problem is actually simple, as you pointed out.
Well, good idea. Warning should be printed out to the console if nothing gets back. Enrichr server are now upgrading. If you still have the same problem, then you need to re-run.
I am also getting the same error that @tjiagoM posted above executing the following on a list of about 50 genes:
en_rnk_1=gp.enrichr(gene_list=rnk1_en,description='test',gene_sets='NCI-Nature_2016',outdir='./GSEA Files/Selected Gene Sets')
I updated to the latest release and am still getting this issue, is there still a problem with the server that is causing this?
I have waited a week and I am still getting the same error?
`2019-09-26 14:28:42,305 Error fetching enrichment results: TRRUST_Transcription_Factors_2019
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-59-902aeaec60e8> in <module>
----> 1 en_rnk_1=gp.enrichr(gene_list=rnk1_en,gene_sets='TRRUST_Transcription_Factors_2019',outdir='./GSEA Files/Selected Gene Sets')
~\Anaconda3\lib\site-packages\gseapy\enrichr.py in enrichr(gene_list, gene_sets, organism, description, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
415 enr = Enrichr(gene_list, gene_sets, organism, description, outdir,
416 cutoff, background, format, figsize, top_term, no_plot, verbose)
--> 417 enr.run()
418
419 return enr
~\Anaconda3\lib\site-packages\gseapy\enrichr.py in run(self)
354 self._logger.debug("Start Enrichr using library: %s" % (self._gs))
355 self._logger.info('Analysis name: %s, Enrichr Library: %s' % (self.descriptions, self._gs))
--> 356 shortID, res = self.get_results(genes_list)
357 # Remember gene set library used
358 res.insert(0, "Gene_set", self._gs)
~\Anaconda3\lib\site-packages\gseapy\enrichr.py in get_results(self, gene_list)
182 if not response.ok:
183 self._logger.error('Error fetching enrichment results: %s'%self._gs)
--> 184 res = pd.read_csv(StringIO(response.content.decode('utf-8')), sep="\t")
185 return [job_id['shortId'], res]
186
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.__name__ = name
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
433
434 try:
--> 435 data = parser.read(nrows)
436 finally:
437 parser.close()
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1137 def read(self, nrows=None):
1138 nrows = _validate_integer('nrows', nrows)
-> 1139 ret = self._engine.read(nrows)
1140
1141 # May alter columns / col_dict
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1993 def read(self, nrows=None):
1994 try:
-> 1995 data = self._reader.read(nrows)
1996 except StopIteration:
1997 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2`
Any insight into why this may be happening?
@tsnetterfield , Sorry for replying late. could you please install the lastest PR and try again? I've update the data that pandas read. Hope this will fix the problem you have
@zqfang Thanks for getting back to me! I updated my Python to 3.7.4 and am still getting the same error I posted above.
@tsnetterfield , Please install the lastest gseapy using the this line of code:
pip install git+git://github.com/zqfang/gseapy.git#egg=gseapy
make sure that you are using v0.9.16
@zqfang When I do this in Anaconda Prompt this is the first line that comes up:
Requirement already satisfied: gseapy from git+git://github.com/zqfang/gseapy.git#egg=gseapy in c:\users\tatiana\anaconda3\lib\site-packages (0.9.15)
Anaconda seems to only see the 0.9.15 development version for some reason.
You cannot install the same package with different version twice. Uninstall old one first.
Sent from my iPhone
On Sep 29, 2019, at 10:17 AM, tsnetterfield notifications@github.com wrote:
@zqfang When I do this in Anaconda Prompt this is the first line that comes up:
Requirement already satisfied: gseapy from git+git://github.com/zqfang/gseapy.git#egg=gseapy in c:\users\tatiana\anaconda3\lib\site-packages (0.9.15)
Anaconda seems to only see the 0.9.15 development version for some reason.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@armadillocommander thanks for the tip! I uninstalled and now have version 0.9.16. However, I am still getting the exact same parser error from above.
@tsnetterfield , do you mind share me with your gene list input? I can't reproduce your bug
Hi @zqfang, attached is the list I was trying to run. I tried a different list just now and got the same error.
@tsnetterfield , sorry for replying late. I was on vacation. However, I still could not reproduce the error you've got using the same code:
en_rnk_1=gp.enrichr(gene_list="my_gene_list.txt" ,description='test',gene_sets='NCI-Nature_2016',outdir='./GSEA Files/Selected Gene Sets')
Even I run the code for 50 times, it did not break.
close now. this issue should be gone now
Alternately, you can save the file as CSV UTF-8 (Comma delimited)
I had the same error I arranged regularizing the data in csv file.
Hello,
I have to run multiple enrichments, over different groups of genes, so I just have a big for loop which goes over all these group of genes, and for each one just runs:
Once in a while I have this error:
I'm having a huge difficulty to isolate the error because this doesn't happen always for the same group of genes. Could anyone give an hint about what the problem could be, as I've started using gseapy only very recently?
If I cannot find the source of error I guess it's fine because I've been able to run for all the groups by just repeating the code... Which is quite annoying as I don't know whether some enrichment might be wrong. What could I be missing here?