ietf-tools / bibxml-service

Django-based Web service implementing IETF BibXML APIs
https://bib.ietf.org
BSD 3-Clause "New" or "Revised" License
17 stars 20 forks source link

Links for draft (I-D) documents now steer to a plaintext file instead of the Datatracker #266

Closed lbartholomew-rpc closed 1 year ago

lbartholomew-rpc commented 2 years ago

Describe the issue

I don't know if this is a new design requirement, but some desired behavior is now missing, so I'm treating it as a bug.

Links for draft (Internet-Draft / I-D) documents steer to a plaintext file now -- via a path that has "archive" in it whether the draft is active or expired -- instead of the informative Datatracker page for the draft in question. The Datatracker page is an "htmlized" copy of the draft, and it also provides an informative launching point for more information -- document status (via the "[Tracker]" link), history, workgroup ("[WG]"), etc. Please use the links below to compare the behavior of the "xml2rfc.html" hyperlinks for draft references versus the "bibtest.html" hyperlinks for draft references.

Please see the following output files:

Current behavior:

https://www.rfc-editor.org/v3test/rfc9053-xml2rfc.xml https://www.rfc-editor.org/v3test/rfc9053-xml2rfc.txt https://www.rfc-editor.org/v3test/rfc9053-xml2rfc.html

bibxml:

https://www.rfc-editor.org/v3test/rfc9053-bibtest.xml https://www.rfc-editor.org/v3test/rfc9053-bibtest.txt https://www.rfc-editor.org/v3test/rfc9053-bibtest.html

https://www.rfc-editor.org/v3test/rfc9053-xml2rfc-vs-bibtest-rfcdiff.html

Code of Conduct

ronaldtse commented 2 years ago

Thanks @lbartholomew-rpc . The reason for using the archive link is that the authoritative information from the datatracker provides the following link only:

<reference>
 <front>
  <!-- ... -->
 </front>
   <seriesInfo name="Internet-Draft" value="draft-ietf-core-oscore-groupcomm-14" />
   <format type="TXT" target="https://www.ietf.org/archive/id/draft-ietf-core-oscore-groupcomm-14.txt" />
</reference>

If the Datatracker link is desired, then the Datatracker's BibXML source should provide the HTML link in addition to the TXT link.

FYI @andrew2net @strogonoff

lbartholomew-rpc commented 2 years ago

Hi again, @ronaldtse. Thanks for this info. as well.

Not sure that I understand correctly how this works and who controls what, but would it be possible to change this Datatracker info. for the draft documents as follows?

OLD: <format type="TXT" target="https://www.ietf.org/archive/id/draft-ietf-core-oscore-groupcomm-14.txt" />

NEW (not sure if "HTML" is correct syntax for format type): <format type="HTML" target="https://www.ietf.org/archive/id/draft-ietf-core-oscore-groupcomm-14.html" />

ronaldtse commented 2 years ago

@lbartholomew-rpc The metadata of all Internet-Drafts come directly from the Datatracker. Today, the Datatracker's BibXML only provides the TXT link. The BibXML service does not "add" information on top of what is given by the authoritative source.

While we could serendipitously add that HTML link, in the future, if the Datatracker changes the location of the HTML link, the BibXML service will break.

The correct way forward here is to have Datatracker provide the BibXML information that includes an HTML link.

Ping @rjsparks @kesara -- can we have that?

rjsparks commented 2 years ago

This runs us directly into the conflict we have with pre-v3 era htmlization (what Lynn is used to at /doc/html/) vs v3 era HTML output. If we change the output of the datatracker's bibxml3 to include other format types, we will need to distinguish HTML from HTMLIZED, and HTMLIZED is what Lynn is looking for.

After providing this, we would have to teach xml2rfc to look for it and use it when rendering reference elements, so this is not going to be something that happens quickly.

So, Lynne, for the moment, if you want to preserve the behavior of pointing to the htmlized variant of the document at the datatracker for docs in need of immediate publication, you would need to edit the references.

But this is not something new to the shift to the bibxml service - the bibxml returned from xml2rfc.tools.ietf.org for internet drafts has been a copy of what the datatracker has been serving for a large number of months, so if you are seeing a recent change in behavior, something else is at the root of it.

rjsparks commented 2 years ago

This may also be caught up in changes to how xml2rfc is invoked: we should look at what's currently being provided on the command line for the *base-reference-url arguments.

rjsparks commented 2 years ago

Yes, some early poking suggests that Lynne's issue is likely rooted in the xml2rfc command invocation arguments. Lets give this some time to be investigated further.

We can look into providing alternate formats in the bibxml for ids as a future enhancement.

lbartholomew-rpc commented 2 years ago

@rjsparks and @ronaldtse -- thanks for the notes and info.! Robert, I'll see if the RPC want to edit the I-D references to point to HTMLized (which I think a lot of authors and readers would prefer; I would too, but I'll see what the group says).

rjsparks commented 2 years ago

@ronaldtse

Right now, as you note, the datatracker provides <format type="TXT" elements.

The bibxml service is not carrying that data forward as <format - there are no format tags in what is returned at, e.g., https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.gont-tcpm-tcp-seq-validation.xml.

How are you populating the target attribute for the <reference element? As long as the RPC's policy is to point to the datatracker's /doc/html pages for those references, that's probably what that attribute should be set to. How hard would it be to do that now?

But did we have a discussion already about using target instead of <format? (I have a vague recollection that we might have). If so, can you point to it easily?

ronaldtse commented 2 years ago

@rjsparks

But did we have a discussion already about using target instead of <format? (I have a vague recollection that we might have). If so, can you point to it easily?

Here it is: https://github.com/ietf-tools/bibxml-service/issues/62#issuecomment-1030038488

@lbartholomew-rpc has also previously provided a template from the RPC that uses target instead of <format>: https://github.com/ietf-tools/bibxml-service/issues/239#issuecomment-1192053641

The bibxml service is not carrying that data forward as <format - there are no format tags in what is returned at, e.g., https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.gont-tcpm-tcp-seq-validation.xml.

It should be easy to get rid of the target attribute and have the content within <format> tags. Should we do that?

Ping @strogonoff @stefanomunarini

rjsparks commented 2 years ago

It would be better to return what the datatracker is returning (which would, I expect, be the source of the template Lynne pointed to ) if we're going to include format tags now.

ronaldtse commented 2 years ago

Got it. @stefanomunarini can you please help here? Thanks.

alicerusso commented 2 years ago

Regarding the mention of xml2rfc options (i.e., "issue is likely rooted in the xml2rfc command invocation arguments" in (https://github.com/ietf-tools/bibxml-service/issues/266#issuecomment-1213238336)) - When testing, the undesired URL (the .txt URL) is in the output, regardless of the xml2rfc options used. (The options commonly used by the RPC are documented here.)

In other words: When I run xml2rfc with a bib.ietf.org ref to an I-D, the output contains the target (the .txt URL) whether or not the option --id-reference-base-url="https://datatracker.ietf.org/doc/html/" is used. If I use the ref but remove the target attribute, it yields the desired output -- i.e., this pull (https://github.com/ietf-tools/bibxml-service/pull/272) seems like an accurate fix.

Side note: Carsten reported the same issue, and presumably he was not using the same xml2rfc options as the RPC. On July 28 on tools-discuss, Carsten wrote:

  • The targets for the Internet-Drafts have changed from HTML to TXT (a regression that I have also noticed earlier).
strogonoff commented 2 years ago

Here it is: #62 (comment)

Another related issue: https://github.com/ietf-tools/bibxml-service/issues/142

ronaldtse commented 2 years ago

@rjsparks @lbartholomew-rpc I would like to further clarify, is this behavior of having links in <format>:

rjsparks commented 2 years ago

We will eventually populate multiple format lines for Internet-Drafts and RFCs. We have no plans at this time to do so for other datasets, but it is conceptually possible that some other source may provide multiple formats that we would want to point to.

To be clear - at some point in the future, we will probably also return to using the target attribute for references for I-Ds and RFCs, but we're going to have to get through a long community conversation about what it should point to first.

ronaldtse commented 2 years ago

Thanks for the clarification.

We shall await for results of the community conversation for further action.

rjsparks commented 2 years ago

@ronaldtse : to make sure we haven't talked passed each other - we're expecting a change that removes target for now for bibxml-ids. When do we expect that to land?

ronaldtse commented 2 years ago

@rjsparks that is being done in https://github.com/relaton/relaton-py/issues/44 , we need to confirm that other than Internet-Drafts, all other bibliographic sources will also use <format> instead of target. Can we confirm this?

rjsparks commented 2 years ago

This only needs to happen for Internet-Drafts. It is reasonable to keep RFC behavior in sync. Other formats do not ned to change (and I think should not be changed).

ronaldtse commented 2 years ago

Thanks @rjsparks for the confirmation. We will apply this new behavior for I-Ds, RFCs, and the RFC subseries. For consistency though if an external bibliographic item provides multiple formats, they ought to be properly represented using <format>?

Note that we have an inconsistency whether to use target or <format> now.

rjsparks commented 2 years ago

Thanks @rjsparks for the confirmation. We will apply this new behavior for I-Ds, RFCs, and the RFC subseries. For consistency though if an external bibliographic item provides multiple formats, they ought to be properly represented using <format>?

No urgency. Nothing in the world yet would know to try to look for or use such a thing. Do we even have en existence proofs yet?

Note that we have an inconsistency whether to use target or <format> now.

Yes. And as I note, we will move towards returning to using target (in addition to format) for drafts and RFCs in the future.

ronaldtse commented 2 years ago

No urgency. Nothing in the world yet would know to try to look for or use such a thing. Do we even have en existence proofs yet?

Understand. We will try applying the new approach to the other data sources as feasible. Thanks!

ajeanmahoney commented 1 year ago

@ronaldtse What's the status of this issue? The relaton dependency (https://github.com/relaton/relaton-py/issues/44) has been closed. Thanks!

ajeanmahoney commented 1 year ago

The target for an I-D is now of the form: https://datatracker.ietf.org/api/v1/doc/document/<draftname>/ (e.g., https://datatracker.ietf.org/api/v1/doc/document/draft-ietf-pce-pcep-ifit/), which is an XML file that contains the title and abstract plus other information about group and state (?). It's not clear what this file is. This seems to be a regression.

Example:

<reference anchor="I-D.ietf-pce-pcep-ifit" target="https://datatracker.ietf.org/api/v1/doc/document/draft-ietf-pce-pcep-ifit/">
<front>
<title>
Path Computation Element Communication Protocol (PCEP) Extensions to Enable IFIT
</title>
<author fullname="Hang Yuan"/>
<author fullname="Xuerong Wang"/>
<author fullname="Pingan Yang"/>
<author fullname="Weidong Li"/>
<author fullname="Giuseppe Fioccola"/>
<date day="3" month="August" year="2022"/>
<abstract>
<t>
In-situ Flow Information Telemetry (IFIT) refers to network OAM data plane on-path telemetry techniques, in particular In-situ OAM (IOAM) and Alternate Marking. This document defines PCEP extensions to allow a Path Computation Client (PCC) to indicate which IFIT features it supports, and a Path Computation Element (PCE) to configure IFIT behavior at a PCC for a specific path in the stateful PCE model. The PCEP extensions described in this document are defined for use with Segment Routing (SR). They could be generalized for all path types, but that is out of scope of this document.
</t>
</abstract>
</front>
<seriesInfo name="Internet-Draft" value="draft-ietf-pce-pcep-ifit-01"/>
<format target="https://datatracker.ietf.org/api/v1/doc/document/draft-ietf-pce-pcep-ifit/" type="TXT"/>
</reference>
rjsparks commented 1 year ago

@ronaldtse - how did that happen? That's absolutely not the right place for a reference to be pointing.

ronaldtse commented 1 year ago

Sorry for the delayed reply. @strogonoff @stefanomunarini: can you help answer @rjsparks and @ajeanmahoney 's enquiry here?

reschke commented 1 year ago

I still don't get why we need the format element, given the fact it never was rendered by xml2rfc v2 in TXT output mode.

Before it get's added again (for some value of "again") in references, can we please (a) describe what it is used for (and why existing elements/attributes do not cover the use case), and (b) specificy how it is rendered (and that consistently in all output formats)?

strogonoff commented 1 year ago

@reschke I believe a clarification regarding <format> vs. reference target was requested by Ronald in #142 previously.

From my understanding,

  1. <format> had a benefit in that it allows multiple links with different renderings of a standard, unlike target which is an attribute and hence a singular value.
  2. Regardless, <format> had been deprecated in RFC 7991.

It is possible to omit format elements, if it will not cause any tooling external to BibXML service to regress.

Tangentionally, this makes me think we should make more extensive use of comments in XML output by the service to aid data rendering issues. For example, if we render some bibliographic data and have a number of representations, we pick one for the target but list possible candidates in a comment. (As long as properly formatted XML comments won’t break any API consumers.) cc @stefanomunarini

(The fact that this hasn’t occurred to BibXML service developers/maintainers could perhaps be partly attributed to a habit of working with JSON APIs—and it was implemented before XML as well here—which natively doesn’t support comments.)

reschke commented 1 year ago

I agree with the "multiple targets" statement - but why does it matter in practice?

strogonoff commented 1 year ago

Well, perhaps it could be used to link to multiple representations of a standard (e.g., plain text, HTML, PDF), but since the element is deprecated it’s probably a moot point…

reschke commented 1 year ago

In case additional links are needed, there's always the \ elements.

strogonoff commented 1 year ago

In case additional links are needed, there's always the elements.

Fair enough. We can stop using the format elements and also start using other capabilities of RFCXML spec (including annotations), as long as it doesn’t break consumers—which I wonder if we could test for, perhaps by invoking those external tools against XML API on CI before it goes live…

reschke commented 1 year ago

Well.

I would actually go back to what tools.ietf.org used to serve, and make any change to that a conscious decision.

ronaldtse commented 1 year ago

Before any change is actually made, please note that it is the RPC that requested for the <format> elements to be provided, and that the links for multiple formats (TXT, HTML) to be provided.

Regardless, <format> had been deprecated in RFC 7991.

As @rjsparks has explicitly noted, the RFC 799X specifications no longer represent the latest "desired practice" since it is somewhat a moving target depending on needs of users -- especially the RPC.

We cannot decide to remove the element without input from the RPC.

reschke commented 1 year ago

So where is the update to the RFC Style Guide (be it in a is draft or in a web page) that asks for this?

And how exactly is a link to an info page a valid use of the format element?

ronaldtse commented 1 year ago

@reschke I believe @lbartholomew-rpc, @ajeanmahoney or @rjsparks are best positioned to answer these questions.

strogonoff commented 1 year ago

As @rjsparks has explicitly noted, the RFC 799X specifications no longer represent the latest "desired practice" since it is somewhat a moving target depending on needs of users -- especially the RPC.

I stand corrected, this slipped my mind.

Edit; perhaps (contrary to the original RFP) the interface of the service calls for per-consumer API accommodations? RPC could use one, some tools could use another, there could be an “utopian” flavour that makes the most use out of RFCXML spec at a cost of being perhaps less compatible with some tooling, etc. This service’s architecture doesn’t make that overly difficult to implement: we can have multiple serializers (currently we have only one xml), which by default are enabled via a parameter but could be exposed behind separate API roots/prefixes to accommodate tooling.

rjsparks commented 1 year ago

Ultimately this is affecting what link is placed in the reference in html renderings. Other threads are pushing more strongly for this to point to html/htmlized documents (for Internet-Drafts) at least.

@ronaldtse, @strogonoff - for bibxml3 (bibxml-ids) - what should come out of the bibxml service is what the datatracker provides. There should be something in testing that raises a fatal error when they are different. Let me reinforce that trying to explore what you can do with the xml vs what has been done to date is causing us pain. We are in a period where we are figuring out how we will make such changes going forward - please don't make changes other than those that are explicitly asked for (and be prepared for friction around some of those).

To be clear, the core question of this ticket should be addressed at the datatracker and the bibxml service should produce what the datatracker provides.

(I'll return with a link to the conversation about where the links in the html/htmlized references should be pointing)

strogonoff commented 1 year ago

My apologies, I did not mean to imply any unilateral changes, my edit was meant to be mostly about service interface requirements. Though implementation difficulty of hypothetical new requirements was considered, my understanding is that only an update in requirements should lead to any functional changes

rjsparks commented 1 year ago

The datatracker now provides a target that points to what the RPC wants. I'll close this issue.