Open ronaldtse opened 3 years ago
@ronaldtse where is legacy/XML data repository for bibxml3 in https://github.com/ietf-ribose?
@strogonoff the data source is not available yet for bibxml3/bibxml-ids.
Legacy path specification described here: https://github.com/ietf-ribose/bibxml-data-ids/issues/1
The bibxml3 endpoint matches URLs this way:
url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),
(though an upcoming release will likely escape the . before xml.)
so, yes, you want to be looking at names that look like https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd-09.xml
Once you've generated something with a version, it won't change, but you will also need to be able to respond to version-less requests, such as:
https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd.xml
Note: we don’t parse name or rev from data, the only formatting variable available in legacy path pattern currently is {ref} representing our canonical reference obtained from filename.
Support for more formatting variables will be filed separately.
@ronaldtse
If we use the second pattern,
I need to know which Relaton fields correspond to “rev” and “name” in this pattern.
We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/
What is “rev” there?
Note: if “rev” can be missing for some citations, those citations may be inaccessible by their legacy paths.
The reference.
prefix is shared for all legacy paths. If it shouldn’t be shared for bibxml-id, let me know.
- The first pattern in ticket description does not match the second pattern in your last comment.
Let me clarify:
Are the legacy paths for the BibXML service, currently defined here: https://svn.ietf.org/svn/tools/xml2rfc/website/rfcs/bibxml/bibxml-ids/gen-bibxml-ids
This is code from the Datatracker service given by @rjsparks:
url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),
The source is: https://github.com/ietf-svn-conversion/ietfdb-final/blob/c6fc13a38ef66d2c2b6d4931627ffd1cbdb4aa98/ietf/doc/urls.py#L89-L90
The Datatracker service is the "authoritative" endpoint for I-D documents.
reference.
prefix.Ah, great… I think that means #28 would be unnecessary so far.
Although, if filenames in our future bibxml-data-ids
dataset don’t contain the “draft” prefix or “draft-number” suffix, the extra flexibility might still be required to support specified path patterns.
2. If we use the second pattern, I need to know which Relaton fields correspond to “rev” and “name” in this pattern. We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/
The Relaton models for IETF ID and NIST differ a lot. So let's not make that comparison.
Here is a report for a random subset of 128 paths (out of 90k+ total) under bibxml3
:
bibxml3-random-subset.zip
Most paths seem to fall back to original xml2rfc data, others resolve automatically to correct new bibitems in relaton-data-ids
in which case XML is different and diffs are available in the report. Diffs seem to be manageable.
Testing all paths would take a while and incur many requests to Datatracker (part of path resolution logic) and xml2rfc tools (for reference comparison), but could be done.
If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.
If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.
Absolutely, this could help.
Right now to use a different URL than “https://datatracker.ietf.org” as Datatracker API root requires a change in the code (datatracker.request.BASE_DOMAIN
), but it’s straightforward to edit the file before running docker-compose. (I could move this value to configuration or environment if warranted.)
Otherwise there should be no issues. The test script can be passed a local BibXML service instance’s URL:
mkdir -p reports && \
python test_paths.py \
http://localhost:8000/public/rfc \
/path/to/local/bibxml-data-archive \
--dirname bibxml3 --verbosity 2 --reports-dir reports --randomize
for the datatracker, you can build a local dev copy quickly. Just clone the datatracker repo and run (cd docker; ./run). There's more at the github project page.
IETF Internet-Drafts (
bibxml3
,bibxml-id
)Legacy pattern(s) to implement:
We need to parse the pattern to return the appropriate BibXML content.