Open ronaldtse opened 2 years ago
@stefanomunarini can we add tests to validate bibitems (selection of tests across all datasets) against the BibXML schema?
bibxml-service reference: https://bib.ietf.org/public/rfc/bibxml-doi/reference.DOI.10.1145/2975159.xml current tools.ietf.org reference: http://xml2rfc.tools.ietf.org/public/rfc/bibxml-doi/reference.DOI.10.1145/2975159.xml
New bibxml-service output's title
is incomplete.
Also it lacks <seriesInfo name="Communications of the ACM" value="Vol. 59, pp. 88-97"/>
.
Incomplete title
Original: "Jupiter rising: a decade of clos topologies and centralized control in Google's datacenter network" New: "Jupiter rising"
Right, this needs to be fixed (@strogonoff ). I think it may be fixed by #215 (@stefanomunarini ).
<seriesInfo name="Communications of the ACM" value="Vol. 59, pp. 88-97"/>
.
@kesara while this could be useful, in <seriesInfo>
, the "name"
attribute value is explicitly invalid according to RFC 7991:
2.47.3. "name" Attribute (Mandatory)
The name of the series. The currently known values are "RFC",
"Internet-Draft", and "DOI". The RFC Series Editor may change this
list in the future.
Yes, we don’t adapt any dates from Crossref to Relaton format yet. It looks like #215 needs to expand on that ASAP.
<title>
: it looks like IETF xml2rfc tools concatenated title
and subtitle
using a colon. Relaton-py could do that if that’s reliable. Currently, relaton-py’s bibxml
serializer doesn’t do any such title adaptation and ends up using the first available title when serializing to BibXML. We can either change BibXML serialization in relaton-py, or change the way we format the main title when parsing Crossref data in bibxml-service.
“Communications of the ACM” is apparently taken from container-title
, we could use that when creating a bibliographic item from Crossref data if that’s always how it should be parsed.
Edit: It appears that two pending PRs by @stefanomunarini to bibxml-service and relaton-py make it so that container-title
is used to define bibliographic item locality, and locality is used by the relaton.serializers.bibxml
to generate <seriesInfo>
. Which looks like what we want! I think it’s on me that has not been merged yet…
Can anyone point to preexisting IETF’s xml2rfc tools Crossref API handler (i.e., what code runs under /public/rfc/bibxml-doi/)? https://github.com/ietf-tools/xml2rfc-bibxml doesn’t seem to have it🤔
What you're looking for is in the RFP, in the section for bibxml7.
@strogonoff the bibxml-doi code is here: https://github.com/ietf-tools/xml2rfc-website/tree/56c0be788c4fd22ae475302dcd399439815927f0/public/rfc/bibxml-doi
It uses doilit
, which we have already reimplemented:
https://github.com/ietf-tools/xml2rfc-website/blob/56c0be788c4fd22ae475302dcd399439815927f0/public/rfc/bibxml-doi/nph-index.cgi#L189
I'm a little perplexed: our doi2ietf already implements dates but why is not serialised into BibXML?
- Yes, we don’t adapt any dates from Crossref to Relaton format yet. It looks like Feat/crossref integration expansion #215 needs to expand on that ASAP.
Yes we need to adopt the dates from the Crossref API and map them to the Relaton model.
Relaton supports these date/time types:
Crossref metadata includes the following date/times:
indexed
: ignorecreated
: Relaton created
deposited
: Relaton created
issued
: Relaton issued
published
: Relaton published
published-online
: ignore
<title>
: it looks like IETF xml2rfc tools concatenatedtitle
andsubtitle
using a colon. Relaton-py could do that if that’s reliable. Currently, relaton-py’sbibxml
serializer doesn’t do any such title adaptation and ends up using the first available title when serializing to BibXML. We can either change BibXML serialization in relaton-py, or change the way we format the main title when parsing Crossref data in bibxml-service.
We should concatenate the Crossref title
and subtitle
at the doi2ietf
level.
- “Communications of the ACM” is apparently taken from
container-title
, we could use that when creating a bibliographic item from Crossref data if that’s always how it should be parsed.
As I pointed out in https://github.com/ietf-ribose/bibxml-service/issues/228#issuecomment-1175771552 , we really want explicit permission from @rjsparks that this is correct usage of <seriesInfo>
. Thanks.
@ronaldtse
our doi2ietf already implements dates but why is not serialised into BibXML?
We are not using doi2ietf for at least these two reasons:
With that in mind, it was faster to bypass doi2ietf-py and implement this directly in bibxml-service and relaton-py.
Yes we need to adopt the dates from the Crossref API and map them to the Relaton model.
Yes, @stefanomunarini’s PRs should take care of all that. It’s aimed to port the requisite functionality from doi2ietf-py into both bibxml-service Crossref DOI parser and relaton-py serializer. I’ll merge them once we confirm that new <seriesInfo name>
values are acceptable, because it contains that as well.
@ronaldtse It is expected that seriesinfo will have more than the 3 possible names listed in 7991. We will make sure that gets clarified in 7991bis. A better thing to read at the moment is the seriesInfo entry at https://authors.ietf.org/en/rfcxml-vocabulary
Note that the RPC uses seriesInfo for documents that are part of a series and have a unique value. Examples of document series include RFC, IEEE Std, ITU Recommendation, DOI, 3GPP TR, 3GPP TS, ISO/IEC, and FIPS PUB. The RPC uses refcontent to capture journal or conference proceedings information: journal or conference title, volumes, pages, conference location, etc. For example,
<refcontent>Communications of the ACM, Vol. 59, pp. 88-97</refcontent>
Thanks @rjsparks @ajeanmahoney .
<seriesInfo>
"name"
Is the seriesInfo value from a controlled vocabulary or free form text? If the former, it would be great to have the specifications.
https://authors.ietf.org/en/rfcxml-vocabulary seems to describe the "name" attribute as the name of the standardization organization outside of IETF ("other names such as "ISO", "W3C" for exist for other standardisation organisations")
Is "name" supposed to take the "series name" or the "organization name"?
From the illustrative list provided it looks like it is the "series name" (which makes sense given the element name), not the "organization name".
Some question regarding the example list:
The item in question has source metadata provided through this Crossref link:
Notice that "Communications of the ACM" exist in container-title
.
As specified by @ajeanmahoney , this information is to be in <refcontent>
, not <seriesInfo>
, and should look like this:
<refcontent>Communications of the ACM, Vol. 59, pp. 88-97</refcontent>
This formatted reference string can only be built from the raw Crossref metadata, by also including these elements:
"page":"88-97",
"volume":"59"
I would like to confirm with @ajeanmahoney that:
refcontent
, not seriesInfo
.refcontent
using Crossref metadata. This is about citation rendering.Thanks!
seriesInfo name and value attributes take freeform text. The name attribute holds the name of the series. The RPC uses the following seriesInfo names:
These are what we have identified so far. We will be discussing this list this week.
Thanks @ajeanmahoney , since there's going to be a discussion if you don't mind let us provide some additional input 😉
Basis:
Questions:
seriesInfo
name should support all series that BibXML service supports today (as part of the ietf-tools suite), including those published by the following organizations:
seriesInfo
name (for organizations external to IAB/IETF) represents the name of the SDO, or a document type of the SDO. Developers and users would certainly prefer a consistent application. Amongst values supported today:
3GPP TS
, 3GPP TR
, ITU Recommendation
, IEEE Std
ISO/IEC
FIPS PUB
(published by the Department of Commerce as executed by NIST)Thanks!
Tests are failing because reference.DOI.10.1145/2975159 doesn't have
date
element underfront
element. This violatesrfc2629.dtd
.Originally posted by @kesara in ietf-tools/xml2rfc#804 (comment)
Can I clarify where is <date>
required? It’s not in this spec.
@ronaldtse While this particular issue may have been resolved, since we can rely on DOI to provide at least one date, we cannot be so sure with some other sources.
For example, we have recently found that some 3GPP documents are lacking dates, and this may be the case with other sources.
There are some cases where a date is never provided in a bib entry (IANA registry entries, for instance).
Sometimes, an author points to a landing page for a spec (a 3GPP or IEEE entry may fall into this category). Those kind of entries don't have dates. I haven't looked to see if the bibxml-service datastore contains landing-page references.
refererences without dates are syntactically legal and appropriate in cases like Jean calls out above. But when the document does have a publication date (as the original DOI the ticket was opened with), the date must be provided, well formed, in the reference.
I think I've pointed this out in other places, but rfc2629.dtd is not v3 rfcxml - it is strict v2, and while we want to be v2 backwards compatible as much as we can be, there are many RFCs in the v2 era that were published with references that didn't contain dates. In short, date cannot be treated as a mandatory element here.
From @kesara https://github.com/ietf-tools/xml2rfc/pull/804#issuecomment-1175684226
Tests are failing because reference.DOI.10.1145/2975159 doesn't have
date
element underfront
element. This violatesrfc2629.dtd
.Originally posted by @kesara in https://github.com/ietf-tools/xml2rfc/issues/804#issuecomment-1175684226