Open bswhite opened 4 years ago
@jaeddy do you know how to translate a BioProject PRJNA id into an SRA SRP ID? i.e., the reverse of above? I spent an hour screwing around with rentrez last night to now available. I don't understand the various NCBI databases and their links -- is there a data model? I have seen rentrez_dbs(), rentrez_db_links(), but they're only so helpful.
@bswhite probably the best way for now is just to use this table — e.g.. I combined the BioProject metadata with all of the matched study, run, experiment, sample metadata from SRA; I think most of the relevant GEO/GSE information should be covered as well.
I can share the code I used, but it's pretty convoluted. Let me know if you run into any cases where you can't find a match. The BPs in the table are all those matched to our PubMed IDs, so it's possible we're missing some (for datasets that aren't linked to publications).
@mc2-center/data-team I forget, is BioProject
still part of any of the Publication tables/manifests? If not, I will close this issue.
library(rentrez) lookup.srp <- function(srp) { r_search <- entrez_search(db="gds", term=paste0(srp, "[ACCN]")) r_search$ids }
get.sra.bioproject <- function(srp) { ids <- lookup.srp(srp) if(length(ids) != 1) { stop("Got multiple ids\n") } gse.id <- entrez_summary(db="gds", id=ids)$accession get.gse.bioproject(gse.id) }
> get.sra.bioproject("SRP212810")
[1] "PRJNA552370"