Open sert23 opened 1 year ago
I am currently trying the same script again (previously working) and a different error happened this time.
Traceback (most recent call last):
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 566, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 533, in _read_next_chunk$
size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 583, in _read_chunked
chunk_left = self._get_chunk_left()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 568, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:çTraceback (most recent call last): File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 444, i n _error_catcher yield File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 567, i n read data = self._fp_read(amt) if not fp_closed else b"" File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 533, i n _fp_read return self._fp.read(amt) if amt is not None else self._fp.read() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 460, in read return self._read_chunked(amt) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 598, in _read_chunked raise IncompleteRead(b''.join(value)) http.client.IncompleteRead: IncompleteRead(4336 bytes read) During handling of the above exception, another exception occurred: [34/826]
Traceback (most recent call last):
File "/home/eap/miRexpress/updates/code/run_update.py", line 211, in
My recommendation is to use an external tool for downloading for now: https://github.com/saketkc/pysradb/issues/201#issuecomment-1843076201
sorry, I think my explanation was not clear. I'm trying to download only metadata.
Is this what you are running (seems okay at my end):
>>> instance = SraSearch(2, 1000000, strategy="miRNA-seq")
>>> df = instance.search() 4%|█▍ | 5400/130053 [03:13<1:19:26, 26.15it/s]
Yep, it starts running but it spits out this error after some minutes...
Traceback (most recent call last): File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 566, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 533, in _read_next_chunk$ size return int(line, 16) ValueError: invalid literal for int() with base 16: b''
I'm guessing something is not formatted properly on SRA side (it happened to me when parsing something else from SRA in python). They include some '\b somewhere in the description fields and python tries to parse this as some kind of binary string....
As a workaround, I'm trying to run the same query on GEO to see if this is parsed differently by them. Alternatively, is there a way to do a SraSearch query but only request the summary fields? (SRX and SRP). This could work for me.
Thanks for your help!
You could try with verbosity=1
thank you, I will try that as last resource. The problem is I'm interested in all SRPs so then I would have to query sample by sample to retrieve since verbosity=1 only gives you experiment accessions.
Describe the bug Not sure what's happening but for the last few days, I'm struggling to download data using pysradb. This used to work no problem a couple of weeks ago. Here is the error I get:
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher [6/370] yield
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 460, in read
return self._read_chunked(amt) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 583, in _read_chunked chunk_left = self._get_chunk_left() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 566, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/home/eap/anaconda/envs/pysradb/lib/python3.10/http/client.py", line 526, in _read_next_chunk_size line = self.fp.readline(_MAXLINE + 1) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1274, in recv_into return self.read(nbytes, buffer) File "/home/eap/anaconda/envs/pysradb/lib/python3.10/ssl.py", line 1130, in read return self._sslobj.read(len, buffer) TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/eap/miRexpress/updates/code/run_update.py", line 200, in
generate_raw_tsv("miRNA-seq", os.path.join(raw_folder, "miRNA-seq.tsv"))
File "/home/eap/miRexpress/updates/code/run_update.py", line 36, in generate_raw_tsv
instance.search()
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 793, in search
self._format_response(r.raw)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/pysradb/search.py", line 861, in _format_response
for event, elem in Et.iterparse(content):
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/xml/etree/ElementTree.py", line 1255, in iterator
data = source.read(16 * 1024)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 566, in read
with self._error_catcher():
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/home/eap/anaconda/envs/pysradb/lib/python3.10/site-packages/urllib3/response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
It seems like it's getting disconnected after some minutes. Is there a parameter I can change to make it retry or something similar? Are they blocking my IP? Is this a widespread recent issue?
To Reproduce This really happen with any attempt now (randomly) after a few minutes. In this example I'm trying to download info about all miRNA-seq samples in SRA:
instance = SraSearch(2, 1000000 strategy="miRNA-seq") print("Downloading samples for " + library_type) instance.search()
Thanks a lot for writing this software and the support!!