saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
307 stars 50 forks source link

The metadata file downloaded directly from the ncbi website is different from the one downloaded using pysradb. #67

Closed SchustekFlorian closed 4 years ago

SchustekFlorian commented 4 years ago

Describe the bug The metadata file downloaded directly from the ncbi website is different from the one downloaded using pysradb.

To Reproduce Steps to reproduce the behavior: pysradb download: pysradb metadata SRP181607 --detailed --saveto file.csv

manual download: When I go on the ncbi website https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125497 I usually follow the link 'SRA Run Selector' a the bottom of the page which brings me here: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA516634&o=acc_s%3Aa. I then click on the link 'Metatadata' which downloads a file with 24 rows, instead of the 12 I get from the pysradb package. It seems that each sample has two runs, of which only one is included in the pysradb metadata.

Desktop (please complete the following information):

Additional context It's not really context but thanks for the amazing package! (this issue makes it less reliable for full automation tho)

saketkc commented 4 years ago

Thanks @SchustekFlorian for the bug report. It would be helpful to know what version of pysradb you are running.

I cannot replicate it with the latest version. See notebook here: https://colab.research.google.com/drive/1QlzKElSS1nz6D1_bUQswBrWwtF_9TQ7P?usp=sharing

SchustekFlorian commented 4 years ago

Fantastic, I was running 0.9.7, but you are right the new version works perfectly. Sorry I should have tried that before filing a bug report. Many thanks!

saketkc commented 4 years ago

No worries, thanks for updating here!