saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
311 stars 51 forks source link

multithreaded ena loading #63

Closed Maarten-vd-Sande closed 4 years ago

Maarten-vd-Sande commented 4 years ago

As mentioned in #61 I thought the "long" duration for my lookups came from unnecessary waits. However the main duration of pysradb.SRAweb.sra_metadata is actually spent in waiting for ENA to reply. By doing the requests to ENA in a threadpool we can speed up this process significantly:

import pysradb
import time

db = pysradb.SRAweb()
now = time.time()

result = db.sra_metadata(
    ["GSM2837485",
     "GSM2385309",
     "GSM3141725",
     "GSM2385313",
     "GSM3756611",
     "GSM2219686",
     "GSM2811115",
     "DRR138954",
     "GSM2837501",
     "DRR138997",
     "SRR10680309"]
    , detailed=True)

print(result)
print(time.time() - now)

This goes from 36.8 to 15.5 seconds.

It can be sped up even more by increasing the amount of threads in the pool, but I think ENA would not appreciate that.

codecov[bot] commented 4 years ago

Codecov Report

Merging #63 into master will increase coverage by 1.08%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   41.92%   43.00%   +1.08%     
==========================================
  Files           5        5              
  Lines        1028     1030       +2     
==========================================
+ Hits          431      443      +12     
+ Misses        597      587      -10     
Impacted Files Coverage Δ
pysradb/sraweb.py 86.33% <100.00%> (+2.23%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d567adb...8793bef. Read the comment docs.

saketkc commented 4 years ago

Tested and indeed saw a ~4x speedup!

Thanks for raising all the issues so far and self-addressing them! Very useful contribution!