ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
97 stars 21 forks source link

Different results with and without brackets? #165

Closed rkrug closed 1 year ago

rkrug commented 1 year ago

I have the following two queries:

s_nat <- '("sustainab*" OR "environ*" OR "resilien*" OR "conserv*" OR "biodivers*" OR “ecosystem*” OR “nature*” OR “planet*” OR “Earth” OR “biosphere”)' 

rr$> openalexR::oa_query(search = s_nat)  |> oa_request(count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
           23057686                 627                   1                   1 

r$> openalexR::oa_query(search = paste0("(", s_nat, ")" ) )  |> oa_request(count_only = TRUE)|> unlist()
              count db_response_time_ms                page            per_page 
           23057766                 757                   1                   1 

why do the brackets around thew search term make a difference?

rkrug commented 1 year ago

And why does filter not filter at all?

openalexR::oa_query(filter = paste0("(", s_nat, ")" ) )  |> oa_request(count_only = TRUE)|> unlist()
              count db_response_time_ms                page            per_page 
          243729490                  22                   1                   1 

r$> openalexR::oa_query(filter = s_nat)  |> oa_request(count_only = TRUE) |> unlist()
              count db_response_time_ms                page            per_page 
          243729490                  22                   1                   1 
yjunechoe commented 1 year ago

These are questions about the database internals that we can't really answer. Honestly I wouldn't worry about it as long as you're consistent about what you use in your queries.

FWIW the difference doesn't look systematic. For example I re-ran your first code and this is what I get now. Note how he numbers are closer now and the non-paranthesis version returns more results this time. It feels pretty random but again, trivial.

oa_query(search = s_nat)  |> oa_request(count_only = TRUE) |> el("count")
#> [1] 23057948
oa_query(search = paste0("(", s_nat, ")" ) )  |> oa_request(count_only = TRUE) |> el("count")
#> [1] 23057943

If this raises concerns for you I'd suggest contacting the OpenAlex team directly about this.

rkrug commented 1 year ago

Thanks June.

Additionally: It seams, that the filter argument is not working (see example above). I would like to try to use the filter instead to have a more transparent search. What am I missing?

openalexR::oa_query(filter = paste0("(", s_nat, ")" ) )  |> oa_request(count_only = TRUE)|> unlist()
              count db_response_time_ms                page            per_page 
          243729490                  22                   1                   1 
yjunechoe commented 1 year ago

Sorry I don't totally understand the filter issue - is it another issue with the parantheses but for the filter argument? What did you expect vs. get with the query?

It might be helpful to open a new issue related to the filter argument and describe the problem from scratch. My head is kinda stuck on the previous problem on this thread 😅

rkrug commented 1 year ago

Ok. Will do so. Sent from my iPhoneOn 13 Sep 2023, at 16:53, June Choe @.***> wrote: Sorry I don't totally understand the filter issue - is it another issue with the parantheses but for the filter argument? What did you expect vs. get with the query? It might be helpful to open a new issue related to the filter argument and describe the problem from scratch. My head is kinda stuck on the previous problem on this thread 😅

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

rkrug commented 1 year ago

The filter problem is solved - I used the wrong syntax.