relaton / relaton-ietf

RFCBib: retrieve RFC Standards for bibliographic use using the BibliographicItem model
BSD 2-Clause "Simplified" License
2 stars 0 forks source link

Remove `docid` with `scope=anchor` #87

Open strogonoff opened 2 years ago

strogonoff commented 2 years ago

Example: https://github.com/ietf-ribose/relaton-data-ids/blob/99b3eadf8840bd841a0044effa87faa4d4e65fce/data/draft-3k1n-6tisch-alice0-00.yaml#L18-L20

Just in case, I think we should first check what Robert further replies in this thread: https://github.com/ietf-ribose/bibxml-service/issues/206#issuecomment-1136058412

cc @ronaldtse

andrew2net commented 2 years ago

@strogonoff @ronaldtse Should we still have anchor in BibXML output? If yes, can we use primary docid as anchor as it is (without replacing spaces and slashes)? If no, where should we render primary docid in BibXML output?

ronaldtse commented 2 years ago

How does "primary docid" look like right now? Why does it have spaces and slashes?

strogonoff commented 2 years ago

@andrew2net I believe we were discussing this question with Robert; i.e. ”do we need to maintain all old anchors as is, or can we generate them from docid even if this means they will sometimes differ a bit?” (see https://github.com/ietf-ribose/bibxml-service/issues/206).

If yes, can we use primary docid as anchor as it is (without replacing spaces and slashes)?

Some replacement may be needed, but this can be done by the app that generates XML.

How does "primary docid" look like right now? Why does it have spaces and slashes?

@ronaldtse, primary IDs for RFCs and RFC subseries have spaces (example 1, example 2).

Apart from that, I suspect primary (citeable) identifiers of non-IETF documents in misc often should have spaces (but don’t): for example, I feel like ISO 10646-1-AD2.1996 should actually be ISO/IEC 10646-1:1993/AMD 2:1996. (I think we are going to do something with such documents, e.g. replace them with documents from authoritative datasets.)

And, non-IETF documents in non-IETF Relaton sources often contain slashes (example).

andrew2net commented 2 years ago

@strogonoff @ronaldtse @opoudjis in the Relaton model we have a docnumber attribute. If I'm right, the attribute is used by Metanorma for sorting purpose. What do you think if we save anchor as the docnumber attribute? Note: for Internet-drafts anchor isn't unique, all versions of a document have same anchor

strogonoff commented 2 years ago
  1. Should we introduce an IETF RFC-specific property that contains the anchor? Similar to how 3GPP has special release.project_end and similar keys, we could have a special bibxml.anchor key or similar. (After that, it’d be safe to drop docid with type=anchor, because no data would be lost.) cc @ronaldtse
  2. Currently, docid with type=anchor seems incorrect. For example, it’s RFC4 here, but it should be RFC0004 (see xml2rfc tools output as an example). Did we change it unintentionally?🤔 cc @andrew2net
strogonoff commented 2 years ago

What do you think if we save anchor as the docnumber attribute?

I feel like we could put anchors in docnumber, but I suspect there’s a semantic difference between a docnumber and anchor… I’m not sure.

andrew2net commented 2 years ago
  1. @strogonoff since we started fetching documents from the https://www.rfc-editor.org/rfc-index.xml dataset, which isn't in a BibXML forma and doesn't contain anchors, we can remove anchors from the relaton-data-rfcs without any problem.
  2. We still have a BibXML parser. I don't know who uses it, and does it need to reproduce the anchors in the Relaton's BibXML output? cc @ronaldtse
strogonoff commented 2 years ago

@andrew2net Interesting:

we started fetching documents from the https://www.rfc-editor.org/rfc-index.xml dataset

Maybe this is why the anchor has changed?

andrew2net commented 2 years ago

Maybe this is why the anchor has changed?

@strogonoff yes, it's definitely so. BTW we still parse BibXML to create documents for relaton-data-ids, so the IDs still have anchors.

ronaldtse commented 2 years ago

I think the answers have all been answered here.

As for:

BibXML to create documents for relaton-data-ids, so the IDs still have anchors

@strogonoff we can strip off the original anchors on the bibitems sourced from the datatracker -- these are the only ones that provide anchors.

andrew2net commented 2 years ago

@strogonoff @ronaldtse can we close this issue?

ronaldtse commented 2 years ago

@andrew2net I'm still hoping we can get rid of the anchors in the Relaton data. @strogonoff can we?