ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
194 stars 38 forks source link

Query building in R #60

Open dwinter opened 8 years ago

dwinter commented 8 years ago

(tagging @Monty9 and @htc502 on this as they've each brought i up recently)

The query syntax used in esearch (and wrapped in entrez_search) is very powerful, but somewhat difficult to type. The basic format includes keywords, fields (denoted by square brackets) and and boolean operators AND, OR and NOT. So you might have

("neoplasms"[MeSH Major Topic] AND Mouse[Title/Abstract]) NOT review[Publication Type]

The NCBI has an "advanced query builder" for each database, but it would be nice be able to generate these queries in an R session.

Right now, we have entrez_db_searchable to list the possible search terms. We could also add a query builder. Either a single function that takes 2 or 3-member arguments:

query_builder( c('neoplasm', 'MeSH'), 
               c('AND', 'mouse', 'TIAB')
)
(neoplasms[MeSH] AND Mouse[TIAB])

.. or taking a leaf ouf of the ggplot2 book and making something like a domain specific language

q <- eq(query='neoplasms', field='MeSH') + eq(query='Mouse', field="TIAB", operator=AND)

Doing this properly will definitely take more time than I have at present, but I'm happy hear opinions about the best way to do it (and to help anyone that wants to try if for themselves)

gadepallivs commented 8 years ago

Hi david, I was asking the same Q on Stackoverflow and @mrdwab (Ananda Mahto) helped me out with this solution.

x <- c("neoplasm", "Lung", "Clinical Trial", "human", "2000:2015")
y <- c("MeSH", "TIAB", "PTYP", "Species","PDAT")
 noquote(sprintf("(%s)", paste(x, "[", y, "]", sep = "", collapse = ", ")))

Output

(neoplasm[MeSH], Lung[TIAB], Clinical Trial[PTYP], human[Species], 2000:2015[PDAT])
dwinter commented 8 years ago

Interesting @Monty9 -- can you link to the SO question?

gadepallivs commented 8 years ago

https://stackoverflow.com/questions/32462726/r-how-to-combine-two-char-vectors-so-that-result-looks-like-char1-char2 My goal is to let the user input his search terms and fields and Then, combine the query to input the query as a parameter to search_entrez. Now, there is no need to pass "field" parameter separately in the search_entrez function.

dwinter commented 8 years ago

So, one thing to do is separate the square brackets from the concetenating

boxify <- function(x) paste0("[",x,"]")

Then you could do something like this

terms <- c("neoplasm", "mouse", "review")
fields <- c("Mesh", "Orgn", "PTYP")
paste0(terms, boxify(fields), collapse=" AND ")
"neoplasm[Mesh] AND mouse[Orgn] AND review[PTYP]"

Note that's not going to work for easily for nested uses of AND OR and NOT

htc502 commented 8 years ago

Hi @dwinter , which way have you choosen to implement the builder function? the simpler way or the ggplot way? why not open a new branch for this feature? I propose the former one as it is easier and we can use this feature right away without much effort ~_~..eager to have a try on it...

dwinter commented 8 years ago

Hi @htc502 -- I don't think either way is very easy :)

The problem is being able to balance the ANDs ORs and NOTs

Definitely won't make it to the next release, but I'm keep to work on it for a future one

htc502 commented 8 years ago

hi,@dwinter, I find that query builder like this is comfortable for me:

screen shot 2016-04-14 at 6 02 00 pm

I steal this from a paper manager software: papers 3. whenever you type something, it will prompt a box like this, allowing u to modify the attributes of your keyword. I don't know if we can find an alternative in an R terminal environment~_~

sbalci commented 5 years ago

I think an advance search formula builder via RStudio Addins would be so much helpful.

dwinter commented 5 years ago

Hi Sbalci,

That would be awesome, but I really don't have the time or skill to make something like this.

I think it would be a cool addition, and would love to work with someone that wanted to dot it, but it's probably not on the horizon just now.

sbalci commented 4 years ago

Dear @dwinter

I have tried to make an RStudio Addins for this purpose. The code requires some editing, I am working on that. If you find it useful, I may make a pull request when it is complete.

PubMedSearch