ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
97 stars 21 forks source link

Feature / warning requests: Usage of `"` versus `'` and wildcard #180

Open rkrug opened 11 months ago

rkrug commented 11 months ago

After some discussions with the OpenAlex support, I have solved an issue I had with using search, namely I used single quote (') while OpenAlex expects the double quotation mark (") to specify words adjacent to each other.

Also (not a problem), the wildcard character * is stripped from the search (as is the single quotation mark ', which causes a problem).

This is done by Elasticsearch, and not under the control of OpenAlex.

I would therefore suggest two things:

  1. In case of the wildcard character, give a warning in the function oa_query() that the wildcard character is stripped away and Openex is not doing any wildcard expansion (but stemming by default)
  2. In the case of a single quote in the search string, (and possibly also in the others like search?), raise an error, as it has an impact on the result and will result in completely wrong results (default operator is AND if no operator is between two words).

This would help a to make openalexR easier to use and more reliable.

Thanks,

Rainer

trangdata commented 11 months ago

Thank you for this excellent suggestion @rkrug. 💯 Do you have any particular query examples you could share?

rkrug commented 11 months ago

Yes - here is the example which I used to solve the "issue" with OpanAlex support.

In a nutshell:

I do a search with the search term bidiversity OR ‘natural environment’ (the typo in “bidiversity" does not matter) and filter for the doi https://doi.org/10.1111/conl.12377.

The result should be one, as it is with this call:

https://api.openalex.org/works?filter=doi%3Ahttps%3A%2F%2Fdoi.org%2F10.1111%2Fconl.12377&search=%27natural%20environment%2A%27%20OR%20bidiversity

{ "meta":{ "count":1, "db_response_time_ms":54, "page":1, "per_page":25, }, ... > But when I change the order of the search terms, the result is zero:

https://api.openalex.org/works?filter=doi%3Ahttps%3A%2F%2Fdoi.org%2F10.1111%2Fconl.12377&search=bidiversity%20OR%20%27natural%20environment%2A%27

{
"meta":{
"count":0,
"db_response_time_ms":71,
"page":1,
"per_page":25,
},
"results":[
],
"group_by":[
]
}

Hope this helps.

Also: as the precedence rules are not that clear, he highly recommended to use brackets.

yjunechoe commented 11 months ago

Just to be clear - does OpenAlex strip ' as well?

Also I don't know if it's just a formatting thing, but your example sometimes uses ‘ and ’ which are not the same as the single quote character ' - not sure if we should catch these for users as well

rkrug commented 11 months ago

Yes - according tho the info I got from OpenAlex, the single inverted comma / quote ' is stripped as well.

Yes - I copied the code, so everything should be the single inverted comma / single quote.