Open ronaldtse opened 2 years ago
I’m currently addressing this the following way:
This means:
Questions to @ronaldtse:
As we use Relaton as internal citation data model, do we have a way of converting Datatracker output to Relaton?
Yes, I believe @CAMOBAP has already done this in Python:
Adding Datatracker as an external source (like DOI)
I don't object to this, but eventually treating Datatracker as DOI/Crossref is problematic.
I wonder if one way to facilitate the check for the "latest draft version" is to have a new Datatracker API that just returns the "latest draft version".
datatracker.ietf.org/drafts/draft-xxx
, I get the latest information on the draft version of xxx
is (e.g. xxx-NN
). Then if BibXML Service has already cached xxx-NN
, we don't need a subsequent fetch, and that caching will be more effective. This will ease the load on Datatracker.I wonder if one way to facilitate the check for the "latest draft version" is to have a new Datatracker API that just returns the "latest draft version".
What if published draft version is changed and our index has stale data? I can’t recall if this can happen.
Datatracker uses the BibXML format
Does it use BibXML format? I checked these two, and results look different from what we have in our BibXML data repositories:
I believe it might be easier to adapt Datatracker’s JSON responses to Relaton format.
@strogonoff you're right! Don't know why I thought Datatracker used the BibXML format.
So... this is to be done using relaton-bib-py then (the Datatracker format => Relaton format conversion)?
IMO it seems least friction for me to just implement a quick converter from Datatracker JSON to Relaton as part of BibXML service itself and later split it out into its own package or included into relaton-bib-py
.
This works for xml2rfc-style API already (see xml2rfc_compat.fetchers.internet_drafts()
logic).
For main API, Datatracker should be queried by default only if requested document is not found (and request specifies correct doctype
of “Internet-Draft”).
For GUI, this is not implemented yet. The original idea was to augment client-side part for item search and item details pages: query service API client-side and augment displayed data with new results, if any.
The Internet-Drafts dataset is the special dataset that combines serving from both bulk loaded data with the latest data coming straight from datatracker.ietf.org.
datatracker.ietf.org
is the authoritative source of the data. The bulk data the BibXML services loads is from a periodic export from datatracker.There are two types of references that are served by the service:
draft-{example-name}
. This is in fact a redirect reference to the draft of the highestdraft-number
number.draft-{example-name}-{draft-number}
. The{draft-number}
is a two-digit sequential integer, starting from00
or01
incrementally.For the Versioned Reference pattern (given
draft-{example-name}-{draft-number}
), the operating mode is:draft-{example-name}-{draft-number}
, return it and done.draft-{example-name}-{draft-number}
, then it means that datatracker.ietf.org may have a new draft number or the draft did not exist in the bulk dataset. The BibXML Service should then contact datatracker.ietf.org to loaddraft-{example-name}-{draft-number}
.For the Unversioned Reference pattern (given
draft-{example-name}
), the operating mode is:draft-{example-name}-{draft-number}
items we have are actually the newest.