relaton / relaton-bipm

MIT License
2 stars 0 forks source link

Support retrieval of Metrologia entries from IOP #2

Closed ronaldtse closed 2 years ago

ronaldtse commented 3 years ago

e.g. DOI https://doi.org/10.1088/1681-7575/aa7b3f

[9] Flowers-Jacobs N-E, Pollarolo A, Coakley J J, Fox A E, Rogalla H, Tew W L and Benz S P 2017 A Boltzmann constant determination based on Johnson noise thermometry Metrologia 54, 730-737 (8 pp)

Gets forwarded to https://iopscience.iop.org/article/10.1088/1681-7575/aa7b3f

Screen Shot 2021-01-03 at 9 02 14 PM
ronaldtse commented 3 years ago

We have many citations to Metrologia in Metanorma due to handling of BIPM documents. We need to support citation of Metrologia articles. If necessary we will need to update the Relaton BIPM bibdata model (or establish a new one for academic articles).

andrew2net commented 3 years ago

@ronaldtse I'm unable to find Metrologia articles index or any search form. Do you have an idea of how to search Metrologia articles?

ronaldtse commented 3 years ago

I think we should have a syntax that fetches per:

The main Metrologia page is this: https://iopscience.iop.org/journal/0026-1394

Screen Shot 2021-01-07 at 4 33 23 AM

Volumes:

Screen Shot 2021-01-07 at 4 33 39 AM

Issues in that Volume: https://iopscience.iop.org/volume/0026-1394/29

Screen Shot 2021-01-07 at 4 34 02 AM

The first issue in that list: https://iopscience.iop.org/issue/0026-1394/29/6

Screen Shot 2021-01-07 at 4 34 23 AM

The first paper/article in that issue: https://iopscience.iop.org/article/10.1088/0026-1394/29/6/001

Screen Shot 2021-01-07 at 4 35 00 AM

Notice the citation writes this: "E C Morris 1993 Metrologia 29 373"

I believe we can have two types of searches for this article:

  1. Citation locator string: "Metrologia 29 6 373" or "Metrologia 29 373" => this article.
  2. Cite by DOI: either one should lead to this article

For citing the Issue, we can do:

  1. Citation locator string: "Metrologia 29 6" => this Issue.
  2. Cite by DOI: either one should lead to this Issue

For citing the Volume, we can do:

  1. Citation locator string: "Metrologia 29" => this Volume.
  2. Cite by DOI: either one should lead to this Volume

For citing the Series, we can do:

  1. Citation locator string: "Metrologia" => the full series.
  2. Cite by DOI: either one should lead to this series
ronaldtse commented 3 years ago

Instead of scraping, we can also directly take the BibTeX export of that page: https://iopscience.iop.org/export?articleId=0026-1394/29/6/373&doi=10.1088/0026-1394/29/6/001&exportFormat=iopexport_bib&exportType=abs&navsubmit=Export+abstract

Screen Shot 2021-01-07 at 4 40 15 AM
andrew2net commented 3 years ago

@ronaldtse I can guess how to map an article to BibModel but I have no idea how to map Issue, Volume, and Series. Do you have a suggestion?

opoudjis commented 3 years ago

Volume = seriees/number Issue = series/partnumber Series = series/title

andrew2net commented 3 years ago

@opoudjis as I understand Ronald means Volume, Issue, and Series to be separated documents. I'm asking what data can we map from the Volume, Issue, and Series pages to the BibliographicItem model?

ronaldtse commented 3 years ago

@andrew2net reports he has encountered rate-limiting via Captcha after several fetches. This is not appropriate for users who compile documents. I don't know whether it is surmountable using User-Agent (please try).

Will also seek advice from BIPM.

EDIT: have sought advice. Pending reply.

andrew2net commented 3 years ago

@ronaldtse I've tried to use random User-Agent but it seems the opscience.iop.org allows only 6 requests per minute. After 2 minutes it starts redirecting to captcha.

ronaldtse commented 3 years ago

Got it. Let’s wait for BIPM’s response.

ronaldtse commented 3 years ago

The BIPM team has inquired with IOPP (the publisher) and they recommended the following:

Our first recommendation is that they use the CrossRef API. It contains all the article metadata (including the references) and If they just need metadata about our articles then that API should cover everything they need.

It’s a very well documented API. The starting point to the documentation is https://www.crossref.org/education/retrieve-metadata/.

Can you help implement the connection to CrossRef? Thanks.

andrew2net commented 3 years ago

@ronaldtse yes I can. They ask for an email in HTTP requests. They need an email for contact us in case our script cause problems. Requests without email won't be redirected to more relaible servers. Do you have an email for this purpose?

ronaldtse commented 3 years ago

Let me ask them. It would be strange to use our email address when users (not us) are doing the requests.

ronaldtse commented 3 years ago

@andrew2net it seems that the email address is optional?

Screen Shot 2021-01-27 at 4 07 39 AM

Let's implement without the email first. Later on we can make a config option with Relaton CLI so users can set their own email address for CrossRef.

andrew2net commented 3 years ago

@ronaldtse yes, it's optional but without an email, it will work slower https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service

andrew2net commented 3 years ago

@ronaldtse here is API status page https://status.crossref.org/#system-metrics you can see that "Polite API" average response time is about 1s while "Public API" averge response time is about 7s.

ronaldtse commented 3 years ago

7s!??!?!?!? Why don't we just use a random email address based on the IP address.

https://www.ipify.org :

require "net/http"
ip = Net::HTTP.get(URI("https://api.ipify.org"))
puts "My public IP Address is: " + ip

Then sha256 it and truncate to 16 for the name. We can use relaton.org for the domain to indicate it is a Relaton request.

i.e. "fa9514ae...@relaton.org".

andrew2net commented 3 years ago

Anyway, the API works too slow. Only "OpenURL" and paid "Plus" services have an acceptable response time. I'll investigate OpenURL. And I haven't been able to find a way to search volumes, issues, and articles. Seems they have only a journal and articles in the DB. In case we won't be successful with the Crossref we can make a relaton-data-bipm-iop repository on GitHub and slowly fetch documents from iopscience.iop.org. What do you think?

ronaldtse commented 3 years ago

I've sent this to BIPM, let's see what their response is.


We’re now experimenting the CrossRef API, but it’s not ideal:

  1. The CrossRef API only accepts a “fuzzy” search with limited filtering options.

There is no mechanism to obtain exactly the Metrologia article unless the author provides the full title and authorship information. It is nearly impossible to locate a particular article with confidence.

  1. It’s very slow. For normal requests, it takes up to 7 seconds (or more with filters). Even if we use the “polite API”, where an email is provided, it takes nearly 2 seconds per request. They have a “plus API” that is on average 0.5 seconds, but it requires payment from the user.

Here’s a real example from the Candela definition MEP: https://www.bipm.org/utils/en/pdf/si-mep/SI-App2-candela.pdf

Screen Shot 2021-01-28 at 9 00 21 AM

NOTE: this reference actually has the wrong title — the correct title is "Predictable Quantum Efficient Detector II: Characterization and confirmed responsivity”, this has an effect on the resulting search. This is why auto-fetch is important — to mitigate authoring errors.

The metadata attributes available here are: author, title, year, issue and page numbers. The intention with auto-fetching is to allow the author to enter minimal identifiable input (i.e. enough information to find this unique reference).

e.g. journal name: Metrologia issue number: 50 page number start: 395

However, the CrossRef API does not provide enough parameters to locate this information. In particular, CrossRef does not support search/filtering by volumes, issues, or page numbers.

In order to use the CrossRef API, the author will be forced to provide the full title and some authorship information: journal name: Metrologia author name: at least one author full title: Predictable Quantum Efficient Detector II: Characterization results

Here are two attempts to find out if it works.

Attempt 1 with author given title

The best effort in finding this article in the CrossRef API is the following command:

curl "https://api.crossref.org/works?query.bibliographic=Predictable%20Quantum%20Efficient%20Detector%20II%3A%20Characterization%20results&query.author=M%C3%BCller&query.container-title=Metrologia&filter=issn:0026-1394,prefix:10.1088

This means, “find items that match the following criteria":

And it returns 20 results, where the desired article is the 3rd. This query took 7 seconds.

=> Not possible to find article

Attempt 2 with corrected title

Since the first attempt failed I did a search online and found the correct title, which is "Predictable quantum efficient detector: II. Characterization and confirmed responsivity”.

Now we refine the command to:

curl "https://api.crossref.org/works?query.bibliographic=Predictable%20Quantum%20Efficient%20Detector%20II%3A%20Characterization%20and%20confirmed%20responsivity&query.author=M%C3%BCller&query.container-title=Metrologia&filter=issn:0026-1394,prefix:10.1088"

Now it returns 7 results, where the desired article is the 1st. This query took 10 seconds.

=> Works when author and title information are fully accurate.

Conclusion

The CrossRef API is unable to facilitate location of a unique article with certainty because it only supports fuzzy search, and does not support searching by volume, issue or page numbers.

It could only locate an article if and only if the article title and authorship information given is fully accurate, and it would return conflicting results when the title contains words that are also used in another article’s title. For example, these two citations will return ambiguous results, even though the volume, issue and years are vastly different:

M G Cox, The evaluation of key comparison data, Metrologia, 39, 6, 589-595, 2002.

M G Cox, The evaluation of key comparison data: determining the largest consistent subset, Metrologia, 44, 3, 2007.

(both from the Kilogram definition MEP)

ronaldtse commented 3 years ago

In any case, I do think that we should support CrossRef separately in say relaton-crossref. There is also a Ruby client gem for CrossRef: https://github.com/sckott/serrano

What do you think?

andrew2net commented 3 years ago

@ronaldtse yes I used the serrano gem. Which documents do you suppose to fetch from CrossRef? I think we can use CrossRef but it works too slow sometimes.

andrew2net commented 2 years ago

@ronaldtse sine we have relaton-doi gem, which fetches documents from crossref.ogr, can we close this issue?

ronaldtse commented 2 years ago

@andrew2net we now have the full data set of Metrologia from BIPM. I will create a new issue and will close this one.

ronaldtse commented 2 years ago

Closing in favour of #28.