markziemann / dee2

Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
http://dee2.io
GNU General Public License v3.0
39 stars 7 forks source link

Pysradb to fetch SRA metadata? #96

Open markziemann opened 2 years ago

markziemann commented 2 years ago

This looks like a more stable alternative https://github.com/saketkc/pysradb

markziemann commented 2 years ago

library("tictoc") library(XML) library(reutils)

tic() eres <- esearch("Escherichia coli[orgn] and transcriptomic[Source] and public[Access] ", db="sra",retmax=999000) str(uid(eres)) esum <- esummary(eres) econtent <- content(esum, "parsed") runvec <- econtent$Runs runvec <- gsub("><",">><<",runvec) runvec <- unlist(strsplit(runvec,"><")) runs <- lapply( runvec ,function(x) { as.vector(xmlToList(x)) } ) runs <- do.call(rbind,runs) toc()

markziemann commented 2 years ago

simpler

pysradb search --organism="Escherichia coli" --source="transcriptomic" --max=999000 > ecoli.tsv
awk '{print $(NF-2)}' ecoli.tsv > ecoli_runs.tsv