Open rkrug opened 7 months ago
Wow this one is really really weird. The problem isn't even about length of the query string. Minimal reprex:
query_substr <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22"
oa_request(query_substr)
#> Warning in oa_request(query_substr): No records found!
#> list()
httr::GET(query_substr)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22]
#> Date: 2024-03-08 18:53
#> Status: 200
#> Content-Type: application/json
#> Size: 332 kB
#> {"meta":{"count":1717,"db_response_time_ms":222,"page":1,"per_page":25,"groups_count":null},"results":[{"id...
This happens because httr::GET()
for some reason mangles the url when we specify query = ...
. So with our per-page=1
default:
httr::GET(query_substr, query = list(`per-page` = 1))
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search%3A%22Agriculture%2Breform%22%2BOR%2B%22ocean%2Breform%22&per-page=1]
#> Date: 2024-03-08 18:57
#> Status: 200
#> Content-Type: application/json
#> Size: 115 B
#> {"meta":{"count":0,"db_response_time_ms":68,"page":1,"per_page":1,"groups_count":null},"results":[],"group_...
Essentially, GET()
sees the "
but encoded as %22
, so does not escape it with the slash.
So instead of this url from above:
bad_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search%3A%22Agriculture%2Breform%22%2BOR%2B%22ocean%2Breform%22&per-page=1"
GET()
should instead be sending something like this:
good_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%5C%22Agriculture+reform%5C%22+OR+%5C%22ocean+reform%5C%22&per-page=1"
httr::GET(good_url)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%5C%22Agriculture+reform%5C%22+OR+%5C%22ocean+reform%5C%22&per-page=1]
#> Date: 2024-03-08 19:33
#> Status: 200
#> Content-Type: application/json
#> Size: 9.69 kB
#> {"meta":{"count":35789,"db_response_time_ms":338,"page":1,"per_page":1,"groups_count":null},"results":[{"id...
One hacky way around that is to add the slash character and ensure that it decodes before GET()
sees it:
httr::GET(
URLdecode(gsub("%22", "%5C%22", bad_url))
)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:\"Agriculture+reform\"+OR+\"ocean+reform\"&per-page=1]
#> Date: 2024-03-08 19:30
#> Status: 200
#> Content-Type: application/json
#> Size: 9.69 kB
#> {"meta":{"count":35789,"db_response_time_ms":338,"page":1,"per_page":1,"groups_count":null},"results":[{"id...
So for your reprex, you can do reformat your url:
query_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22+OR+%22energy+reform%22+OR+%22decarbonization%22+OR+%22Eco-friendly+Subsidies%22+OR+%22Green+Subsidies%22+OR+%22Polluter+Pays+Principle%22+OR+%22Environmental+Externalities%22+OR+%22Biodiversity+Offsetting%22+OR+%22Conservation+Finance%22+OR+%22Payment+for+Ecosystem+Services%22+OR+%22Agri-environmental+Schemes%22+OR+%22Cross-compliance%22+OR+%22Eco-taxes%22+OR+%22Sustainable+Agriculture+Incentives%22+OR+%22Carbon+Pricing%22+OR+%22Biodiversity+Credits%22+OR+%22Habitat+Banking%22+OR+%22Rewilding+Incentives%22+OR+%22Green+Bonds%22+OR+%22Ecological+Fiscal+Transfers%22+OR+%22Renewable+Energy+Subsidies%22+OR+%22Water+Quality+Trading%22+OR+%22Sustainable+Fisheries+Subsidies%22+OR+%22Green+Certification+Schemes%22+OR+%22Conservation+Easements%22+OR+%22Environmental+Impact+Bonds%22+OR+%22Climate+Smart+Agriculture%22+OR+%22Natural+Capital+Financing%22+OR+%22Bioenergy%22+OR+%22Forest+Carbon+Credits%22+OR+%22Blue+Carbon+Initiatives%22+OR+%22Green+Public+Procurement%22+OR+%22Integrated+Pest+Management+Incentives%22+%22Wildlife+Corridors+Funding%22+OR+%22Biodiversity+Banking%22+OR+%22Climate+Adaptation+Finance%22+OR+%22Deforestation+Reduction+Programs%22+OR+%22Environmental+Risk+Assessment%22+OR+%22Green+Infrastructure+Investments%22+OR+%22High+Conservation+Value+Incentives%22+OR+%22Landscape+Restoration+Funds%22+OR+%22Marine+Protected+Areas+Support%22+OR+%22Natural+Resource+Management%22+OR+%22Organic+Farming+Subsidies%22+OR+%22Permaculture+Design+Grants%22+OR+%22Pollination+Services+Payments%22+OR+%22Protected+Area+Financing%22+OR+%22Regenerative+Agriculture+Support%22+OR+%22Sustainability+Linked+Loans%22+OR+%22Urban+Greening+Grants%22+OR+%22Wetlands+Restoration+Funding%22+OR+%22Zero+Emission+Vehicle+Incentives%22+OR+%22Adaptive+Management+Practices%22+OR+%22Biodiversity+Informatics%22+OR+%22Climate+Bonds%22+OR+%22Debt-for-Nature+Swap%22+OR+%22Ecosystem-Based+Adaptation%22+OR+%22Forest+Stewardship+Council+Certification%22+OR+%22Greenhouse+Gas+Inventory%22+%22Habitat+Restoration+Grants%22+OR+%22Invasive+Species+Control+Funding%22+OR+%22Land+Degradation+Neutrality+Fund%22+OR+%22Mitigation+Banking%22+OR+%22Non-Timber+Forest+Product+Incentives%22+%22Ocean+Acidification+Research+Grants%22+OR+%22Pollinator+Habitat+Enhancement%22+OR+%22Renewable+Energy+Certificates%22+OR+%22Soil+Health+Improvement+Programs%22+OR+%22Tree+Planting+Campaigns%22+OR+%22Wildlife+Management+Areas%22+OR+%22Biodiversity+Strategy+and+Action+Plans%22+OR+%22Circular+Economy+Initiatives%22+OR+%22Disaster+Risk+Reduction+Funding%22+OR+%22DRR+Funding%22+OR+%22Ecosystem+Valuation%22+OR+%22Fisheries+Improvement+Projects%22+OR+%22Green+Job+Training+Programs%22+OR+%22Holistic+Management+Funding%22+OR+%22Indigenous+Peoples%27+Biodiversity+Conservation%22+OR+%22Landscape+Connectivity+Projects%22+OR+%22Mangrove+Restoration+Initiatives%22+OR+%22Nature-based+Solutions%22+OR+%22Organic+Certification+Cost+Share%22+OR+%22Peatland+Restoration+and+Management%22+OR+%22Quantitative+Easing+for+the+Planet%22+OR+%22Riparian+Buffer+Zones+Support%22+OR+%22Sustainable+Land+Management%22+OR+%22Threatened+Species+Recovery+Plans%22+OR+%22Urban+Biodiversity+Enhancement%22+OR+%22Vertical+Farming+Incentives%22+OR+%22Water+Efficiency+Programs%22+OR+%22Xeriscaping+Rebates%22+OR+%22Youth+Engagement+in+Conservation%22+OR+%22Zero-waste+Strategies%22+OR+%22Agrobiodiversity+Conservation+Subsidies%22+OR+%22Biochar+Production+Incentives%22+OR+%22Climate+Resilience+Building%22+OR+%22Drought+Management+Assistance%22+OR+%22Eco-labeling+Programs%22+OR+%22Functional+Biodiversity+Promotion%22+OR+%22Green+Supply+Chain+Financing%22+OR+%22Hedgerow+Restoration+Support%22+OR+%22Integrated+Water+Resources+Management+Funding%22+OR+%22Jungle+Restoration+Projects%22"
query_url2 <- gsub("%22", "%5C%22", query_url)
This still errors though, but now for a different reason - it's just genuinely long:
cat(rawToChar(
httr::GET(query_url2)$content
))
#> <html>
#> <head>
#> <title>Bad Request</title>
#> </head>
#> <body>
#> <h1><p>Bad Request</p></h1>
#> Request Line is too large (4468 > 4094)
#> </body>
#> </html>
Overall I'm completely stumped though. I have no idea why this is an issue and whether this is on our end, OA's end, httr's end, etc.
Hm. What about using the opportunity to move to httr2? That would exclude one possible culprit.
Also - if I could try to get somebody from OA to look at it - maybe log files?
Switching over to httr2 would indeed be nice but it'll require more than just rewriting code and I currently don't have the bandwidth for this - I'll keep the issue in mind but for now the workaround above should do.
Sorry just for completeness - what function call generated the long query URL you originally posted? Was it spit out by oa_query()
(if so, what were the inputs??
I got the URL from the OpenAlex web interface. If I remember correctly, the original search term did not work via openalexR (same symptoms as to long, but probably something different - by the way, it would be niche to give a warning if the url might be to long), so I tried the API to find out by how much. But there it worked. So I copied the API call back into the openalexR call, which is where it did not worked.
Switching over to httr2 would indeed be nice but it'll require more than just rewriting code
Could you elaborate? Why do you say that? I agree, that a switch to httr2 opens the possibility to do some breaking changes (openalexR2), but why do you say that is necessary?
Could you elaborate? Why do you say that? I agree, that a switch to httr2 opens the possibility to do some breaking changes (openalexR2), but why do you say that is necessary?
Oh - it's not necessary to switch over at all! I just meant that if we were to, it would require quite a bit of work.
I have an extremely long search query which works in the browser.
But when running
Created on 2024-03-08 with reprex v2.1.0