saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
303 stars 49 forks source link

[BUG] Metadata download not only for the provided accession number #203

Closed cpavloud closed 9 months ago

cpavloud commented 9 months ago

Hello,

I just installed pysradb (version: 2.2.0) using conda on a Mac (Apple M2 Pro chip, Sonoma 14.0 (23A344)).

I am running the command pysradb metadata PRJEB24595 –desc –expand > PRJEB24595_pysradb_metadata.tsv but the output file has 14015 lines, instead of 8 (that it should have in this particular case)

I thought it was a bug having to do with the ENA project accession number so I tried this command pysradb metadata SRA009436 –desc –expand > SRA009436_pysradb_metadata.tsv but again the output has 14039 lines, instead of 32 that it should have. My 32 wanted lines are there, but I thought the output was supposed to include only those...

saketkc commented 9 months ago

Both these commands work for me. Can you point me to the (incorrect) documentation on where you found --desc --expand being used?

 $ pysradb metadata PRJEB24595 --detailed --saveto PRJEB24595.tsv
 $ cat PRJEB24595.tsv| wc -l
       9
 $ pysradb metadata SRA009436 --detailed --saveto SRA009436.tsv
 $ cat SRA009436.tsv  | wc -l
      33
cpavloud commented 9 months ago

Well, I found it in the original publication, where it is mentioned that

We require detailed metadata associated with each sample to perform any downstream analysis. For example, the assays used for different samples and the corresponding treatment conditions. This can be done by supplying the ‘–desc’ flag $ pysradb metadata SRP010679 –desc | head -5

_This can be further expanded to reveal the data in ‘sampleattribute’ column into separate columns via ‘–expand’ flag. This is most useful for samples that have associated treatment or cell type metadata available. $ pysradb metadata SRP010679 –desc –expand

saketkc commented 9 months ago

Thanks! The publication is now out of date with version 2.0. Docs are available here:https://saket-choudhary.me/pysradb/