Closed ronaldtse closed 2 years ago
We have many citations to Metrologia in Metanorma due to handling of BIPM documents. We need to support citation of Metrologia articles. If necessary we will need to update the Relaton BIPM bibdata model (or establish a new one for academic articles).
@ronaldtse I'm unable to find Metrologia articles index or any search form. Do you have an idea of how to search Metrologia articles?
I think we should have a syntax that fetches per:
The main Metrologia page is this: https://iopscience.iop.org/journal/0026-1394
Volumes:
Issues in that Volume: https://iopscience.iop.org/volume/0026-1394/29
The first issue in that list: https://iopscience.iop.org/issue/0026-1394/29/6
The first paper/article in that issue: https://iopscience.iop.org/article/10.1088/0026-1394/29/6/001
Notice the citation writes this: "E C Morris 1993 Metrologia 29 373"
I believe we can have two types of searches for this article:
For citing the Issue, we can do:
For citing the Volume, we can do:
For citing the Series, we can do:
Instead of scraping, we can also directly take the BibTeX export of that page: https://iopscience.iop.org/export?articleId=0026-1394/29/6/373&doi=10.1088/0026-1394/29/6/001&exportFormat=iopexport_bib&exportType=abs&navsubmit=Export+abstract
@ronaldtse I can guess how to map an article to BibModel but I have no idea how to map Issue, Volume, and Series. Do you have a suggestion?
Volume = seriees/number Issue = series/partnumber Series = series/title
@opoudjis as I understand Ronald means Volume, Issue, and Series to be separated documents. I'm asking what data can we map from the Volume, Issue, and Series pages to the BibliographicItem model?
@andrew2net reports he has encountered rate-limiting via Captcha after several fetches. This is not appropriate for users who compile documents. I don't know whether it is surmountable using User-Agent (please try).
Will also seek advice from BIPM.
EDIT: have sought advice. Pending reply.
@ronaldtse I've tried to use random User-Agent but it seems the opscience.iop.org allows only 6 requests per minute. After 2 minutes it starts redirecting to captcha.
Got it. Let’s wait for BIPM’s response.
The BIPM team has inquired with IOPP (the publisher) and they recommended the following:
Our first recommendation is that they use the CrossRef API. It contains all the article metadata (including the references) and If they just need metadata about our articles then that API should cover everything they need.
It’s a very well documented API. The starting point to the documentation is https://www.crossref.org/education/retrieve-metadata/.
Can you help implement the connection to CrossRef? Thanks.
@ronaldtse yes I can. They ask for an email in HTTP requests. They need an email for contact us in case our script cause problems. Requests without email won't be redirected to more relaible servers. Do you have an email for this purpose?
Let me ask them. It would be strange to use our email address when users (not us) are doing the requests.
@andrew2net it seems that the email address is optional?
Let's implement without the email first. Later on we can make a config option with Relaton CLI so users can set their own email address for CrossRef.
@ronaldtse yes, it's optional but without an email, it will work slower https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service
@ronaldtse here is API status page https://status.crossref.org/#system-metrics you can see that "Polite API" average response time is about 1s while "Public API" averge response time is about 7s.
7s!??!?!?!? Why don't we just use a random email address based on the IP address.
require "net/http"
ip = Net::HTTP.get(URI("https://api.ipify.org"))
puts "My public IP Address is: " + ip
Then sha256 it and truncate to 16 for the name. We can use relaton.org for the domain to indicate it is a Relaton request.
i.e. "fa9514ae...@relaton.org".
Anyway, the API works too slow. Only "OpenURL" and paid "Plus" services have an acceptable response time. I'll investigate OpenURL. And I haven't been able to find a way to search volumes, issues, and articles. Seems they have only a journal and articles in the DB. In case we won't be successful with the Crossref we can make a relaton-data-bipm-iop repository on GitHub and slowly fetch documents from iopscience.iop.org. What do you think?
I've sent this to BIPM, let's see what their response is.
We’re now experimenting the CrossRef API, but it’s not ideal:
There is no mechanism to obtain exactly the Metrologia article unless the author provides the full title and authorship information. It is nearly impossible to locate a particular article with confidence.
Here’s a real example from the Candela definition MEP: https://www.bipm.org/utils/en/pdf/si-mep/SI-App2-candela.pdf
NOTE: this reference actually has the wrong title — the correct title is "Predictable Quantum Efficient Detector II: Characterization and confirmed responsivity”, this has an effect on the resulting search. This is why auto-fetch is important — to mitigate authoring errors.
The metadata attributes available here are: author, title, year, issue and page numbers. The intention with auto-fetching is to allow the author to enter minimal identifiable input (i.e. enough information to find this unique reference).
e.g. journal name: Metrologia issue number: 50 page number start: 395
However, the CrossRef API does not provide enough parameters to locate this information. In particular, CrossRef does not support search/filtering by volumes, issues, or page numbers.
In order to use the CrossRef API, the author will be forced to provide the full title and some authorship information: journal name: Metrologia author name: at least one author full title: Predictable Quantum Efficient Detector II: Characterization results
Here are two attempts to find out if it works.
The best effort in finding this article in the CrossRef API is the following command:
This means, “find items that match the following criteria":
And it returns 20 results, where the desired article is the 3rd. This query took 7 seconds.
=> Not possible to find article
Since the first attempt failed I did a search online and found the correct title, which is "Predictable quantum efficient detector: II. Characterization and confirmed responsivity”.
Now we refine the command to:
Now it returns 7 results, where the desired article is the 1st. This query took 10 seconds.
=> Works when author and title information are fully accurate.
The CrossRef API is unable to facilitate location of a unique article with certainty because it only supports fuzzy search, and does not support searching by volume, issue or page numbers.
It could only locate an article if and only if the article title and authorship information given is fully accurate, and it would return conflicting results when the title contains words that are also used in another article’s title. For example, these two citations will return ambiguous results, even though the volume, issue and years are vastly different:
M G Cox, The evaluation of key comparison data, Metrologia, 39, 6, 589-595, 2002.
M G Cox, The evaluation of key comparison data: determining the largest consistent subset, Metrologia, 44, 3, 2007.
(both from the Kilogram definition MEP)
In any case, I do think that we should support CrossRef separately in say relaton-crossref. There is also a Ruby client gem for CrossRef: https://github.com/sckott/serrano
What do you think?
@ronaldtse yes I used the serrano gem. Which documents do you suppose to fetch from CrossRef? I think we can use CrossRef but it works too slow sometimes.
@ronaldtse sine we have relaton-doi gem, which fetches documents from crossref.ogr, can we close this issue?
@andrew2net we now have the full data set of Metrologia from BIPM. I will create a new issue and will close this one.
Closing in favour of #28.
e.g. DOI https://doi.org/10.1088/1681-7575/aa7b3f
[9] Flowers-Jacobs N-E, Pollarolo A, Coakley J J, Fox A E, Rogalla H, Tew W L and Benz S P 2017 A Boltzmann constant determination based on Johnson noise thermometry Metrologia 54, 730-737 (8 pp)
Gets forwarded to https://iopscience.iop.org/article/10.1088/1681-7575/aa7b3f