ietf-tools / bibxml-service

Django-based Web service implementing IETF BibXML APIs
https://bib.ietf.org
BSD 3-Clause "New" or "Revised" License
17 stars 20 forks source link

Implement new version/dynamic fetching for Internet-Drafts #63

Open ronaldtse opened 2 years ago

ronaldtse commented 2 years ago

The Internet-Drafts dataset is the special dataset that combines serving from both bulk loaded data with the latest data coming straight from datatracker.ietf.org.

datatracker.ietf.org is the authoritative source of the data. The bulk data the BibXML services loads is from a periodic export from datatracker.

There are two types of references that are served by the service:

  1. Unversioned Reference pattern: draft-{example-name}. This is in fact a redirect reference to the draft of the highest draft-number number.
  2. Versioned Reference pattern: draft-{example-name}-{draft-number}. The {draft-number} is a two-digit sequential integer, starting from 00 or 01 incrementally.

For the Versioned Reference pattern (given draft-{example-name}-{draft-number}), the operating mode is:

For the Unversioned Reference pattern (given draft-{example-name}), the operating mode is:

strogonoff commented 2 years ago

I’m currently addressing this the following way:

This means:

Questions to @ronaldtse:

ronaldtse commented 2 years ago

As we use Relaton as internal citation data model, do we have a way of converting Datatracker output to Relaton?

Yes, I believe @CAMOBAP has already done this in Python:

Adding Datatracker as an external source (like DOI)

I don't object to this, but eventually treating Datatracker as DOI/Crossref is problematic.

I wonder if one way to facilitate the check for the "latest draft version" is to have a new Datatracker API that just returns the "latest draft version".

strogonoff commented 2 years ago

I wonder if one way to facilitate the check for the "latest draft version" is to have a new Datatracker API that just returns the "latest draft version".

What if published draft version is changed and our index has stale data? I can’t recall if this can happen.

strogonoff commented 2 years ago

Datatracker uses the BibXML format

Does it use BibXML format? I checked these two, and results look different from what we have in our BibXML data repositories:

I believe it might be easier to adapt Datatracker’s JSON responses to Relaton format.

ronaldtse commented 2 years ago

@strogonoff you're right! Don't know why I thought Datatracker used the BibXML format.

So... this is to be done using relaton-bib-py then (the Datatracker format => Relaton format conversion)?

strogonoff commented 2 years ago

IMO it seems least friction for me to just implement a quick converter from Datatracker JSON to Relaton as part of BibXML service itself and later split it out into its own package or included into relaton-bib-py.

strogonoff commented 2 years ago

This works for xml2rfc-style API already (see xml2rfc_compat.fetchers.internet_drafts() logic).

For main API, Datatracker should be queried by default only if requested document is not found (and request specifies correct doctype of “Internet-Draft”).

For GUI, this is not implemented yet. The original idea was to augment client-side part for item search and item details pages: query service API client-side and augment displayed data with new results, if any.