mc2-center / pubmed-crawler

PubMed Crawler for CCKP publication manifest
Creative Commons Zero v1.0 Universal
1 stars 0 forks source link

finding BioProject associated with a SRA SRP id #5

Open bswhite opened 4 years ago

bswhite commented 4 years ago

library(rentrez) lookup.srp <- function(srp) { r_search <- entrez_search(db="gds", term=paste0(srp, "[ACCN]")) r_search$ids }

get.sra.bioproject <- function(srp) { ids <- lookup.srp(srp) if(length(ids) != 1) { stop("Got multiple ids\n") } gse.id <- entrez_summary(db="gds", id=ids)$accession get.gse.bioproject(gse.id) }

> get.sra.bioproject("SRP212810")

[1] "PRJNA552370"

jaeddy commented 4 years ago

@bswhite I copied this to a wiki page to keep as a reference.

bswhite commented 4 years ago

@jaeddy do you know how to translate a BioProject PRJNA id into an SRA SRP ID? i.e., the reverse of above? I spent an hour screwing around with rentrez last night to now available. I don't understand the various NCBI databases and their links -- is there a data model? I have seen rentrez_dbs(), rentrez_db_links(), but they're only so helpful.

jaeddy commented 4 years ago

@bswhite probably the best way for now is just to use this table — e.g.. I combined the BioProject metadata with all of the matched study, run, experiment, sample metadata from SRA; I think most of the relevant GEO/GSE information should be covered as well.

I can share the code I used, but it's pretty convoluted. Let me know if you run into any cases where you can't find a match. The BPs in the table are all those matched to our PubMed IDs, so it's possible we're missing some (for datasets that aren't linked to publications).

vpchung commented 2 years ago

@mc2-center/data-team I forget, is BioProject still part of any of the Publication tables/manifests? If not, I will close this issue.