Open SchmidtPaul opened 3 years ago
I'm running into the same issue as @SchmidtPaul. My search term
is 2980 characters long (below), so I'm guessing that is the problem. It would not be hard at all to split this into multiple sub-searches, but it would also help to have guidance if this will help since I have many searches like this to do.
"Felodipine"[Title/Abstract] OR "Felo Biochemie"[Title/Abstract] OR "BC Brand of Felodipine"[Title/Abstract] OR "Felo-Puren"[Title/Abstract] OR "Felo Puren"[Title/Abstract] OR "Alpharma Brand of Felodipine"[Title/Abstract] OR "Felobeta"[Title/Abstract] OR "betapharm Brand of Felodipine"[Title/Abstract] OR "Felocor"[Title/Abstract] OR "Hexal Brand of Felodipine"[Title/Abstract] OR "Felodipin 1A Pharma"[Title/Abstract] OR "1A Brand of Felodipine"[Title/Abstract] OR "Felodipin AbZ"[Title/Abstract] OR "AbZ Brand of Felodipine"[Title/Abstract] OR "Felodipin AL"[Title/Abstract] OR "Aliud Brand of Felodipine"[Title/Abstract] OR "Felodipin AZU"[Title/Abstract] OR "Azupharma Brand of Felodipine"[Title/Abstract] OR "Felodipin dura"[Title/Abstract] OR "Merck dura Brand of Felodipine"[Title/Abstract] OR "Felodipin Heumann"[Title/Abstract] OR "Heumann, Felodipin"[Title/Abstract] OR "Heumann Brand of Felodipine"[Title/Abstract] OR "Felodipin Stada"[Title/Abstract] OR "Stadapharm Brand of Felodipine"[Title/Abstract] OR "felodipin von ct"[Title/Abstract] OR "ct-Arzneimittel Brand of Felodipine"[Title/Abstract] OR "ct Arzneimittel Brand of Felodipine"[Title/Abstract] OR "Felodipin-ratiopharm"[Title/Abstract] OR "Felodipin ratiopharm"[Title/Abstract] OR “ratiopharm Brand of Felodipine"[Title/Abstract] OR "Felodur"[Title/Abstract] OR "Alphapharm Brand of Felodipine"[Title/Abstract] OR "Felogamma"[Title/Abstract] OR "Worwag Brand of Felodipine"[Title/Abstract] OR "Fensel"[Title/Abstract] OR "Pharmacia Spain Brand of Felodipine"[Title/Abstract] OR "H 154-82"[Title/Abstract] OR "H 154 82"[Title/Abstract] OR "H 15482"[Title/Abstract] OR "Perfudal"[Title/Abstract] OR "Pharmaceutica Astra Brand of Felodipine"[Title/Abstract] OR "Plendil"[Title/Abstract] OR "Flodil"[Title/Abstract] OR "AstraZeneca Brand of Felodipine"[Title/Abstract] OR "Promed Brand of Felodipine"[Title/Abstract] OR "Modip"[Title/Abstract] OR "Astra Brand of Felodipine"[Title/Abstract] OR "Renedil"[Title/Abstract] OR "Hoechst Brand of Felodipine"[Title/Abstract] OR "Munobal"[Title/Abstract] OR "Aventis Brand of Felodipine"[Title/Abstract] OR "Agon"[Title/Abstract] OR "TheraPharm Brand of Felodipine"[Title/Abstract] OR "Agon SR"[Title/Abstract] OR "BRN 4331472"[Title/Abstract] OR "dl-Felodipine"[Title/Abstract] OR "Feloday"[Title/Abstract] OR "Felodipina"[Title/Abstract] OR "Felodipinum"[Title/Abstract] OR "Felodur ER"[Title/Abstract] OR "Felogard"[Title/Abstract] OR "Flodil"[Title/Abstract] OR "H154/82"[Title/Abstract] OR "Hydac"[Title/Abstract] OR "Munobal Retard"[Title/Abstract] OR "Penedil"[Title/Abstract] OR "Perfudal"[Title/Abstract] OR "Plendil Depottab"[Title/Abstract] OR "Plendil ER"[Title/Abstract] OR "Plendil Retard"[Title/Abstract] OR "Preslow"[Title/Abstract] OR "Prevex"[Title/Abstract] OR "Splendil"[Title/Abstract] OR "UNII-OL961R6O2C"[Title/Abstract] OR "Felogard"[Title/Abstract] OR "Felogard ER"[Title/Abstract] OR "Plendil"[Title/Abstract] OR "Renedil"[Title/Abstract]
@SchmidtPaul, I worked through the query issues a bit using your easier-to-separate example, and what needs to happen is that we need to simplify our queries to be short enough so that PubMed doesn't quit for the query length. For your example, query_part2
is too long to run, but I was able to combine query_part1
and query_part3
as follows:
query_part1 <- "(adolescent[Title/Abstract] OR adolescents[Title/Abstract] OR apprentice[Title/Abstract] OR apprentices[Title/Abstract] OR child[Title/Abstract] OR children[Title/Abstract] OR pupil[Title/Abstract] OR pupils[Title/Abstract] OR student[Title/Abstract] OR students[Title/Abstract] OR teenager[Title/Abstract] OR teenagers[Title/Abstract] OR trainee[Title/Abstract] OR trainees[Title/Abstract] OR young adult[Title/Abstract] OR young adults[Title/Abstract] OR young people[Title/Abstract] OR young worker[Title/Abstract] OR young workers[Title/Abstract] OR younger population[Title/Abstract] OR youth[Title/Abstract])"
query_part2 <- "((ankle[Title/Abstract] OR ankles[Title/Abstract] OR arm[Title/Abstract] OR arms[Title/Abstract] OR back[Title/Abstract] OR body[Title/Abstract] OR bone[Title/Abstract] OR bones[Title/Abstract] OR capsule[Title/Abstract] OR capsules[Title/Abstract] OR elbow[Title/Abstract] OR elbows[Title/Abstract] OR extremity[Title/Abstract] OR extremities[Title/Abstract] OR finger[Title/Abstract] OR fingers[Title/Abstract] OR foot[Title/Abstract] OR feet[Title/Abstract] OR forearm[Title/Abstract] OR forearms[Title/Abstract] OR hand[Title/Abstract] OR hands[Title/Abstract] OR hip[Title/Abstract] OR hips[Title/Abstract] OR invertebral disc[Title/Abstract] OR invertebral discs[Title/Abstract] OR joint[Title/Abstract] OR joints[Title/Abstract] OR knee[Title/Abstract] OR knees[Title/Abstract] OR leg[Title/Abstract] OR legs[Title/Abstract] OR ligament[Title/Abstract] OR ligaments[Title/Abstract] OR limb[Title/Abstract] OR limbs[Title/Abstract] OR muscle[Title/Abstract] OR muscles[Title/Abstract] OR neck[Title/Abstract] OR pelvis[Title/Abstract] OR pelvises[Title/Abstract] OR shoulder[Title/Abstract] OR shoulders[Title/Abstract] OR sinew[Title/Abstract] OR sinews[Title/Abstract] OR skull[Title/Abstract] OR skulls[Title/Abstract] OR spinal disc[Title/Abstract] OR spinal discs[Title/Abstract] OR spine[Title/Abstract] OR spines[Title/Abstract] OR tendon[Title/Abstract] OR tendons[Title/Abstract] OR thigh[Title/Abstract] OR thighs[Title/Abstract] OR toe[Title/Abstract] OR toes[Title/Abstract] OR trunk[Title/Abstract] OR trunks[Title/Abstract] OR vertebral disc[Title/Abstract] OR vertebral discs[Title/Abstract]) AND (ache[Title/Abstract] OR aches[Title/Abstract] OR contorsion[Title/Abstract] OR cramp[Title/Abstract] OR cramps[Title/Abstract] OR damage[Title/Abstract] OR damages[Title/Abstract] OR deformity[Title/Abstract] OR deformities[Title/Abstract] OR degeneration[Title/Abstract] OR degenerations[Title/Abstract] OR degenerative change[Title/Abstract] OR degenerative changes[Title/Abstract] OR dislocation[Title/Abstract] OR dislocations[Title/Abstract] OR disorder[Title/Abstract] OR disorders[Title/Abstract] OR distorsion[Title/Abstract] OR distorsions[Title/Abstract] OR fracture[Title/Abstract] OR fractures[Title/Abstract] OR inflammation[Title/Abstract] OR inflammations[Title/Abstract] OR injury[Title/Abstract] OR injuries[Title/Abstract] OR luxation[Title/Abstract] OR luxations[Title/Abstract] OR musculoskeletal[Title/Abstract] OR pain[Title/Abstract] OR pains[Title/Abstract] OR posture[Title/Abstract] OR postures[Title/Abstract] OR prolapse[Title/Abstract] OR prolapses[Title/Abstract] OR rupture[Title/Abstract] OR ruptures[Title/Abstract] OR sprain[Title/Abstract] OR sprains[Title/Abstract] OR spraining[Title/Abstract] OR symptom[Title/Abstract] OR symptoms[Title/Abstract] OR syndrome[Title/Abstract] OR syndromes[Title/Abstract] OR tension[Title/Abstract] OR tensions[Title/Abstract] OR torsion[Title/Abstract] OR torsions[Title/Abstract] OR twist[Title/Abstract] OR twists[Title/Abstract]))"
query_part3 <- "(frequency[Title/Abstract] OR frequencys[Title/Abstract] OR occurence[Title/Abstract] OR prevalence[Title/Abstract] OR prevalences[Title/Abstract] OR rate[Title/Abstract] OR rates[Title/Abstract] OR risk[Title/Abstract] OR risks[Title/Abstract])"
# long_query <- paste(query_part1,
# query_part2,
# query_part3,
# sep = " AND ")
#
# rentrez::entrez_search(db = "pubmed",
# term = long_query,
# use_history = TRUE)
q1 <- entrez_search(db="pubmed", term=query_part1, use_history=TRUE)
#q2 <- entrez_search(db="pubmed", term=query_part2, use_history=TRUE, WebEnv=q1$web_history$WebEnv)
q3 <- entrez_search(db="pubmed", term=query_part3, use_history=TRUE, WebEnv=q1$web_history$WebEnv)
query_combine_1_3 <- sprintf("#%s AND #%s", q1$web_history$QueryKey, q3$web_history$QueryKey)
q1_3 <- entrez_search(db="pubmed", term=query_combine_1_3, use_history=TRUE, WebEnv=q1$web_history$WebEnv)
The documentation of EUtils suggests that using a POST HTTP method may work for these longer queries (https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.term).
Thanks @billdenney for the effort! Sorry for being repetive, but just to make sure I understood everything, I am going to summarize it:
Some query strings are too long, just like q2
:
library(rentrez)
# queries -----------------------------------------------------------------
q1 <- "(adolescent[Title/Abstract] OR adolescents[Title/Abstract] OR apprentice[Title/Abstract] OR apprentices[Title/Abstract] OR child[Title/Abstract] OR children[Title/Abstract] OR pupil[Title/Abstract] OR pupils[Title/Abstract] OR student[Title/Abstract] OR students[Title/Abstract] OR teenager[Title/Abstract] OR teenagers[Title/Abstract] OR trainee[Title/Abstract] OR trainees[Title/Abstract] OR young adult[Title/Abstract] OR young adults[Title/Abstract] OR young people[Title/Abstract] OR young worker[Title/Abstract] OR young workers[Title/Abstract] OR younger population[Title/Abstract] OR youth[Title/Abstract])"
q2a <- "ankle[Title/Abstract] OR ankles[Title/Abstract] OR arm[Title/Abstract] OR arms[Title/Abstract] OR back[Title/Abstract] OR body[Title/Abstract] OR bone[Title/Abstract] OR bones[Title/Abstract] OR capsule[Title/Abstract] OR capsules[Title/Abstract] OR elbow[Title/Abstract] OR elbows[Title/Abstract] OR extremity[Title/Abstract] OR extremities[Title/Abstract] OR finger[Title/Abstract] OR fingers[Title/Abstract] OR foot[Title/Abstract] OR feet[Title/Abstract] OR forearm[Title/Abstract] OR forearms[Title/Abstract] OR hand[Title/Abstract] OR hands[Title/Abstract] OR hip[Title/Abstract] OR hips[Title/Abstract] OR invertebral disc[Title/Abstract] OR invertebral discs[Title/Abstract] OR joint[Title/Abstract] OR joints[Title/Abstract] OR knee[Title/Abstract] OR knees[Title/Abstract] OR leg[Title/Abstract] OR legs[Title/Abstract] OR ligament[Title/Abstract] OR ligaments[Title/Abstract] OR limb[Title/Abstract] OR limbs[Title/Abstract] OR muscle[Title/Abstract] OR muscles[Title/Abstract] OR neck[Title/Abstract] OR pelvis[Title/Abstract] OR pelvises[Title/Abstract] OR shoulder[Title/Abstract] OR shoulders[Title/Abstract] OR sinew[Title/Abstract] OR sinews[Title/Abstract] OR skull[Title/Abstract] OR skulls[Title/Abstract] OR spinal disc[Title/Abstract] OR spinal discs[Title/Abstract] OR spine[Title/Abstract] OR spines[Title/Abstract] OR tendon[Title/Abstract] OR tendons[Title/Abstract] OR thigh[Title/Abstract] OR thighs[Title/Abstract] OR toe[Title/Abstract] OR toes[Title/Abstract] OR trunk[Title/Abstract] OR trunks[Title/Abstract] OR vertebral disc[Title/Abstract] OR vertebral discs[Title/Abstract]) AND (ache[Title/Abstract] OR aches[Title/Abstract] OR contorsion[Title/Abstract] OR cramp[Title/Abstract] OR cramps[Title/Abstract] OR damage[Title/Abstract] OR damages[Title/Abstract] OR deformity[Title/Abstract] OR deformities[Title/Abstract] OR degeneration[Title/Abstract] OR degenerations[Title/Abstract] OR degenerative change[Title/Abstract] OR degenerative changes[Title/Abstract] OR dislocation[Title/Abstract]"
q2b <- "dislocations[Title/Abstract] OR disorder[Title/Abstract] OR disorders[Title/Abstract] OR distorsion[Title/Abstract] OR distorsions[Title/Abstract] OR fracture[Title/Abstract] OR fractures[Title/Abstract] OR inflammation[Title/Abstract] OR inflammations[Title/Abstract] OR injury[Title/Abstract] OR injuries[Title/Abstract] OR luxation[Title/Abstract] OR luxations[Title/Abstract] OR musculoskeletal[Title/Abstract] OR pain[Title/Abstract] OR pains[Title/Abstract] OR posture[Title/Abstract] OR postures[Title/Abstract] OR prolapse[Title/Abstract] OR prolapses[Title/Abstract] OR rupture[Title/Abstract] OR ruptures[Title/Abstract] OR sprain[Title/Abstract] OR sprains[Title/Abstract] OR spraining[Title/Abstract] OR symptom[Title/Abstract] OR symptoms[Title/Abstract] OR syndrome[Title/Abstract] OR syndromes[Title/Abstract] OR tension[Title/Abstract] OR tensions[Title/Abstract] OR torsion[Title/Abstract] OR torsions[Title/Abstract] OR twist[Title/Abstract] OR twists[Title/Abstract]"
q3 <- "(frequency[Title/Abstract] OR frequencys[Title/Abstract] OR occurence[Title/Abstract] OR prevalence[Title/Abstract] OR prevalences[Title/Abstract] OR rate[Title/Abstract] OR rates[Title/Abstract] OR risk[Title/Abstract] OR risks[Title/Abstract])"
q2 <- paste0("(", q2a, " OR " , q2b, ")")
q13 <- paste0("(", q1, " AND " , q3, ")")
# basic search ------------------------------------------------------------
q2 <- entrez_search(db="pubmed", term=q2 , use_history=TRUE) # too long together
#> Error in entrez_check(response): HTTP failure 414, the request is too large. For large requests, try using web history as described in the rentrez tutorial
q2a <- entrez_search(db="pubmed", term=q2a, use_history=TRUE) # 1st half works
q2b <- entrez_search(db="pubmed", term=q2b, use_history=TRUE, # 2nd half works
WebEnv=q2a$web_history$WebEnv)
1. Simplifying/Separating the query should always (?) allow us to circumvent this issue
If we can simplify a too-long query by first separating it into multiple short-enough queries that can then be combined back together, it works via the approach you gave up there. You did this for q1
and q3
up there, but note that q13
isn't actually too long and works fine. So here I did it with separating q2
:
# combined search ---------------------------------------------------------
q_combine_2ab <- sprintf("#%s OR #%s", q2a$web_history$QueryKey, q2b$web_history$QueryKey)
q2ab <- entrez_search(db="pubmed", term=q_combine_2ab, use_history=TRUE, WebEnv=q2a$web_history$WebEnv)
# check outcome -----------------------------------------------------------
q2a
#> Entrez search result with 821036 hits (object contains 20 IDs and a web_history object)
#> Search term (as translated): ankle[Title/Abstract] OR ankles[Title/Abstract] OR ...
q2b
#> Entrez search result with 4683876 hits (object contains 20 IDs and a web_history object)
#> Search term (as translated): dislocations[Title/Abstract] OR disorder[Title/Abs ...
q2ab
#> Entrez search result with 5206125 hits (object contains 20 IDs and a web_history object)
#> Search term (as translated): #1 OR #2
Created on 2021-01-28 by the reprex package (v1.0.0)
2. Using POST may actually allow for longer query strings and make 1. irrelevant If the change you suggested in issue 164 is accepted, we may not need to circumvent the issue as just described, since this may do the job:
entrez_search(db="pubmed", term=q2 , use_history=TRUE, use_post=TRUE)
3. The code revolving around web_history may change if your feature request issue 166 goes through
If this is all correct, then I'd be fine with closing this issue.
@SchmidtPaul:
use_history
would work for a long time even if #166 goes through, but that my proposal in #166 would make it a lot simpler for the user to integrate history into the process.I'm also having this problem with a long query which is not addressed by using the history and post parameters.
However, what puzzles me is this same query does not pose a problem when I am using edirect, the UNIX command-line interface for eutilities. This suggests to me there is some way to make the API call without generating an error on query length. For example, the command
esearch -db pubmed -query "$(cat MTIAquery.txt)"
reads my file containing a long query with about 17,600 characters and returns 21,000+ citations without any problems.
Obviously, my personal work-around here is to use edirect! But it feels clunky to swicth between programs, and it would be nice to have it all in my R script, so I was trying to figure it out with rentrez.
Edit--Ah, I see use_post is not actually implemented yet. That would explain why it wasn't working for me! Carry on, then.
Hi all and thanks @AlastairKelly for the tip.
Sorry this issue has langushed for a little while, I'm moving to a new position and life is a little hectic just now. Just leaving this comment as a note that this issue is a priority once I have a bit of clear-space to think about it.
(And an invitation to anyone that would like to offer a PR, of course!)
@dwinter, I made a PR in #164 ! :)
Haha, so I can reiterate that it's been a hectic time lately. Will try to get some time to check it over soon :) Thanks a lot @billdenney
I submitted PR #174, which is similar PR to #164 but passes tests.
Since it hasn't been reviewed yet and it seems this repo isn't being actively maintained, I chose to implement the fix in a fork. If interested, install the fork with devtools::install_github("allenbaron/rentrez")
.
For larger requests it's a good idea to create and query a web_history
object with:
wh <- entrez_search(..., web_history = TRUE)
orwh <- entrez_post(db = <db of interest>, id = <vector of IDs>)
These approaches should work without problems in that fork.
Hi there, great package! Maybe I am missing something here, but is my request really too large to run even with the
use_history = TRUE
argument in place as suggested in the tutorial?Created on 2021-01-19 by the reprex package (v0.3.0.9001)
Maybe I am just running into issue #46 here? If so, do you have a suggestion of how to circumvent this problem?