saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
310 stars 51 forks source link

[BUG] JsonDecodeError #78

Open Maarten-vd-Sande opened 4 years ago

Maarten-vd-Sande commented 4 years ago

Describe the bug My colleague @Rebecza is trying to download a single-cell ATAC-seq dataset and uses pysradb to get some metadata (seq2science), and managed to find a JsonDecodeError :bug: . It's a list of approx 750 ENA samples, the strange this is the JsonDecodeError appears with the full list, but when split up in smaller lists it seems to work...

To Reproduce I put it on colab, not sure if the link is working https://colab.research.google.com/drive/1bC2WiA63JJnWYZew0pk6iovk537vQzaU?usp=sharing

saketkc commented 4 years ago

Thanks for creating a reproducible example. My guess is a long list of ids is causing SRA to timeout. I would suggest processing it in batches just the way you have done while I figure out if it can indeed be fixed.

kpj commented 3 years ago

I ran into the same problem and also solved it with the same approach (iterating over chunks of the accession list).

For querying, this seems to be implemented for SraSearch already: https://github.com/saketkc/pysradb/blob/c23d4a769543d05a0f002d1b28c985da5963573f/pysradb/search.py#L757-L760

Would it make sense to do the same for SRAweb as well? It seems like all terms are simply joined so far: https://github.com/saketkc/pysradb/blob/c23d4a769543d05a0f002d1b28c985da5963573f/pysradb/sraweb.py#L252-L253