iodepo / OceanBestPractices

Repository to store the OpenSource version of the code made by E84 for OceanBestPractices.org
https://oceanbestpractices.org
GNU Affero General Public License v3.0
12 stars 6 forks source link

DOI search - not working #236

Open paulineobps opened 2 years ago

paulineobps commented 2 years ago

on the new interface I tried 2 DOI searches- both of which are in the database - neither worked

http://dx.doi.org/10.25607/OBP-561 10.26198/gfgr-fq47

the search help indicates using format: 10.26198/gfgr-fq47

paulineobps commented 2 years ago

sorry Paul I just cannot get this to work on any DOI search

paulpilone commented 2 years ago

@paulineobps I'm now able to find the document in your original comment using the search value 10.25607/OBP-561. When I use the value 10.26198/gfgr-fq47 I get 33 results ... so I'm not sure if that's correct or not. e.g. here is one of my searches:

image

paulineobps commented 2 years ago

The problem is that your retrieval actual DOI is : http://dx.doi.org/10.25607/OBP-1724

sorry Paul

Pauline Pauline SIMPSON Ocean Best Practices System Project Manager https://www.oceanbestpractices.org/

UNESCO /IOC Project Office for IODE, Wandelaarkaai 7/61, 8400 Oostende, Belgium Email: @.**@.>; Alternate email: @.**@.>; Skype: Pauline Simpson ORCID: 0000-0003-2551-5740

http://www.iode.org


From: Paul Pilone @.> Sent: 23 August 2022 12:16 To: iodepo/OceanBestPractices @.> Cc: Simpson, Pauline @.>; Mention @.> Subject: Re: [iodepo/OceanBestPractices] DOI search - not working (Issue #236)

CAUTION: This email is external from UNESCO. Please be vigilant on its sender and content. ATTENTION : Cet e-mail est externe à l'UNESCO. Soyez vigilant sur son expéditeur et contenu.

@paulineobpshttps://github.com/paulineobps I'm now able to find the document in your original comment using the search value 10.25607/OBP-561. When I use the value 10.26198/gfgr-fq47 I get 33 results ... so I'm not sure if that's correct or not. e.g. here is one of my searches:

[image]https://user-images.githubusercontent.com/981227/186209383-226208f0-8423-4308-862e-1cd5b030ee1b.png

— Reply to this email directly, view it on GitHubhttps://github.com/iodepo/OceanBestPractices/issues/236#issuecomment-1224295860, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZQUQZS7CUYW4WOGRVPXSTV2T2NVANCNFSM556SFQDA. You are receiving this because you were mentioned.Message ID: @.***>

paulpilone commented 2 years ago

Ok I see it now. It actually looks like it's matching because of the 10.25607 and you can even find it because of OBP. I think Elasticsearch is parsing that URI on punctuation. This is tricky because it's a URI but we don't actually want to require the user to search for the entire URI - just a portion of it - but a portion we want to define. I'll try looking into this more when I can.

paulpilone commented 2 years ago

@paulineobps for the DOI - do we only care about the last 2 parts of the path? e.g. the metadata field can be stored as the full URL http://dx.doi.org/10.25607/OBP-561 and then we can store 10.25607/OBP-561 separately so the user can find that portion.

The other option is I handle DOI searches and append a wildcard to them so the actual search becomes *10.25607/OBP-561 so the user can find a DOI using just the path of it.

I'm trying to understand if we want to support both options of searching or just require the user to follow the instructions in the search tips.

paulineobps commented 2 years ago

everything before 10.25607/OBP-561 might not be unique, but every DOI will have a 10.xxxxx/nr, so go with that. It is in the Search Tips but also in future work, we have asked for a short tip in the search box when you choose a particular search parameter so for the DOI search it would have text in the search box that the search should be in the format 10xxxx (eg. 10.25607/OBP-561 or 10.1021/acssensors.1c01685 etc) The 10.25607 is a unique id in the DOI for IOC (our parent organization) and the number after the following/ is a unique numbering sequence within that org, so OBP-561 is unique for OBPS.
10.1021/acssensors.1c01685 is the unique numbering : 10.1021 is the American Chemical Society and acssensors.1c01685 is the unique DOI number sequence for articles in their journal ACS Sensors. does that help?

paulpilone commented 2 years ago

This has been temporarily fixed and I'm going to remove myself as the assignee. This should still be looked at and fixed in a more permanent way. See the linked PR for more information or look at this line: https://github.com/iodepo/OceanBestPractices/blob/6eeef70dd8adeda64fb9884522eca755ed02e6f9/api/lib/search-query-builder.ts#L43

paulineobps commented 2 years ago

format 10.1021/acssensors.1c01685 is working now.

_This should still be looked at and fixed in a more permanent way. Is this robust enough now