Closed Maarten-vd-Sande closed 4 years ago
Merging #61 into master will decrease coverage by
0.01%
. The diff coverage is66.66%
.
@@ Coverage Diff @@
## master #61 +/- ##
==========================================
- Coverage 42.23% 42.22% -0.02%
==========================================
Files 5 5
Lines 1030 1035 +5
==========================================
+ Hits 435 437 +2
- Misses 595 598 +3
Impacted Files | Coverage Δ | |
---|---|---|
pysradb/sraweb.py | 84.12% <66.66%> (-0.48%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5e54c4e...aeaa494. Read the comment docs.
Thanks @Maarten-vd-Sande , this looks great and is very helpful! I have not revisited the sleep time issues for a while given things looked mostly stable. But this PR should be a good opportunity to do so.
Might be helpful for @bscrow's PR #57 too. I would need some time to review this.
Glad that it is considered useful :smile: . Let me know if I can somehow help to move this forward.
Do you think we should propogate the self.sleep_time
in the retry functions? It is not being used anyhwere. One idea would be to not have it and let the retry block as is (it has a sleep of 0.5 seconds by default)
I think the API key is not really necessary actually. I only used SRAweb with detailed=True
, and I thought it took so "long" because pysradb looked up each sample separately (which it doesn't). I just discovered that the reason it takes so long is checking for the ENA download link.
I changed the time.sleeps to use the self.time_sleep, so at least that is cleaned up. This works and the small tests I did do not run into API limits. However the api-key is not really necessary, so we can also close this PR and ignore this :)
Yep @saketkc including an API key for the search feature should speed up large queries for SRA metadata, since the current implementation retrieves metadata in batches of 300 entries. I'll test it out for the search module
Seems like I keep on missing stuff in the code :eyes:. Where is this split (batches of 300) made @bscrow ?
@Maarten-vd-Sande this split is only implemented in the upcoming pysradb search feature (#57 ), which is different from the pysradb metadata feature found in pysradb/sraweb.py in that it fetches metadata based on text queries instead of accession numbers
I guess, this is still useful if someone is interested in trying out the API key and the current behavior remains unaffected. I might expose this to command line in the future.
What are your thoughts @Maarten-vd-Sande and @bscrow?
It doesn't hurt to have it, although it might be misleading in that it speeds up pysradb with "normal" usage. If it helps with the search function then that's nice!
I have to say it requires more thorough testing to see if it hits the API limit, since I haven't really stress-tested it.
I decided to merge it. In my experience, stress testing NCBI is hard (very unreliable behavior overall). Since it doesn't change the current behavior, and might potentially be useful (hopefully), we can keep it.
Many thanks for your contribution @Maarten-vd-Sande!
Just a FYI, every once in a while I get:
Unable to parse xml: {"error":"API rate limit exceeded","api-key":"131.174.27.98","count":"4","limit":"3"}
Maybe the sleep time should be slightly increased to avoid this.
Thanks for catching this @Maarten-vd-Sande. Do you have an example I can test this against?
Let me see if I can cook something up :)
I decided to just implement the skeleton instead of leaving it up to you, see #60. This still needs the sleeps changed, but I am not sure which sleep is for what, since some are 1/10th, 1/3rd, 1/2 etc.
works and does not crash (and takes the same amount of time as without a key).