ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
196 stars 38 forks source link

Function to extract elements from a list of esummaries #43

Closed dwinter closed 9 years ago

dwinter commented 9 years ago

A common use-case for entrez_summary is to grab a bunch of records and extracts some specific information about them, say the title of a list set of population summary records.

At the moment the workflow for is to fetch a list of summaries and use lapply(recs, "[[", ..:

pop_ids = c(307082412, 307075396, 307075338, 307075274)
pop_summ <- entrez_summary(db="popset", id=pop_ids)
sapply(pop_summ, "[[", "title")

That approach is probably not all that read-able, and it's unlikely someone new R will come up with it themselves.

So it would be nice to have a way to extract elements from the lists. Ideall you could always to this, to create a data.frame

do.call(rbind.data.frame, pop_summ)

But that doesn't work for records (like those ones) with nested lists. Instead I'd like to have method named something like extract_from summary

extract_from_summary(summary_record=pop_summs, target="title", simplify=TRUE)

Which probably just wraps sapply.

To finish this we need

dwinter commented 9 years ago

At the moment entrez_summary returns a single record when given one ID, and list when give more. With the above design, a user running extract_from_summary on a single esummary record will get an error (becuase the single record itself really just a list, and sapply will iterate over that!)

Thus, entrez_sumamry should have a new option to return a "list of one" so that it will play nicely with a summary extracting fxn.

dwinter commented 9 years ago

This will be now ready to close when new-vignette branch is merged in