Document type repeats in document ID

strogonoff commented 2 years ago

Example: https://demo.bibxml.org/ieee/IEEE_P1752_D-1.2020-04/

Document type is IEEE, document ID is IEEE P1752/D-1.2020-04. Seems redundant?

This is causing confusion exemplified by https://github.com/ietf-ribose/bibxml-service/issues/48.

ronaldtse commented 2 years ago

Unfortunately it is not redundant because IEEE offers other document identifiers that do not start with "IEEE". To the normal user it seems to be, but not in terms of correctness.

My issue was more of the user experience.

strogonoff commented 2 years ago

Unfortunately it is not redundant because IEEE offers other document identifiers that do not start with "IEEE".

But do the rest of identifiers ever clash?

strogonoff commented 2 years ago

I think this is an issue of what is the standardized identifier shape for this document type. If we could have any supporting reference, I can see it being useful at eliminating such questions.

ronaldtse commented 2 years ago

That's why we are developing a PubID scheme for IEEE. There is no official reference today.

strogonoff commented 2 years ago

Maybe we could make that identifier not start with IEEE? Or is that a non-starter (sorry for the pun)?

strogonoff commented 2 years ago

The rest of the identifier is the actual document ID, so they surely aren‘t going to clash.

ronaldtse commented 2 years ago

Maybe we could make that identifier not start with IEEE? Or is that a non-starter (sorry for the pun)?

But we need to distinguish the different IEEE docs. They have publications from the "former IEEE" called "IRE" and also "ANSI" standards that use those prefixes instead of "IEEE".

strogonoff commented 2 years ago

Shouldn’t we just use type for that? That way if IEEE gets renamed again we can just start using the new name for future citations; otherwise we have to upgrade all document type on all preexisting IEEE citations.

ronaldtse commented 2 years ago

Yes we should use type for that. Technically each document ID should belong to some schema/type.

strogonoff commented 2 years ago

That’d be nice, so then we’d have { type: IEEE, id: P1752/D-1.2020-04 } instead of { type: IEEE, id: IEEE P1752/D-1.2020-04 }.

I think it’d be better to make this transition earlier rather than later, to avoid breaking any links or API consumers.

As an aside, this applies to NIST as well, and possibly some other doctypes.

ronaldtse commented 2 years ago

That’d be nice, so then we’d have { type: IEEE, id: P1752/D-1.2020-04 } instead of { type: IEEE, id: IEEE P1752/D-1.2020-04 }.

@andrew2net could we do this? Let me know what you think.

andrew2net commented 2 years ago

That’d be nice, so then we’d have { type: IEEE, id: P1752/D-1.2020-04 } instead of { type: IEEE, id: IEEE P1752/D-1.2020-04 }.

@andrew2net could we do this? Let me know what you think.

@ronaldtse I discussed this question with Nick a few months ago. He confirmed the id should be with type

ronaldtse commented 2 years ago

I think this warrants further discussion. And this discussion is a Relaton one. I can see that both approaches can be correct.

Approach 1:

docidentifier represents the "schema" and the "label". In the case of "ISO 1234:2020", schema is "iso" or "ISO", the label is "ISO 1234:2020". (for those uninitiated, we can have "ISO/TR 1234:2020", "ISO/TR/WD 1234:2020", etc.

Approach 2:

docidentifier represents the "schema" and "components of the label". i.e. there is never a full label. In the case of "ISO 1234:2020", schema is "iso" or "ISO", "number" is "1234", "edition" is "2020", type is "international standard".

So this is actually a comparison on whether we have a full string representation or a structured identifier.

Clearly:

For users who cannot understand the ISO DocID schema, they must rely on a fully formatted string.
For users who understand the ISO DocID schema, they can use the components to generate the string, or just to locate information about it.

Then the answer is, we should have both.

strogonoff commented 2 years ago

I see. The first “schema” interpretation makes sense to me. There are precedents without such redundancy, e.g. ISBNs and DOIs don’t have the extra prefix, but if such is the schema for other types then so be it.

strogonoff commented 2 years ago

It seems that type is used as a schema/namespace in some identifiers, and as publisher in others. Also, I’ve been told that the first identifier is something called “PubID”, and only it can be used by formattedref on relations… and formattedref doesn’t use type at all. It looks like we are giving different document identifiers different semantics.

ronaldtse commented 2 years ago

Also, I’ve been told that the first identifier is something called “PubID”

"PubID" is not a name (yet) and not a defined scheme that works across all publishers... Formally, the first identifier is the authoritative identifier (authoritative i.e. the publisher's identifier).

"formattedref" is a "string representation of the authoritative identifier". There is technically no "type" because this is a "string", but I'm sure @andrew2net can provide the "schema name" of the authoritative identifier.

It looks like we are giving different document identifiers different semantics.

Yes. Different document identifiers have different semantics.

It seems that type is used as a schema/namespace in some identifiers, and as publisher in others.

This seems to be the case but is not. Most publishers only have one identifier scheme, therefore the latter. Some publishers have multiple identifier schemes, e.g. IEEE. In this case we must differentiate the identifier schemes amongst the same publisher (e.g. ieee vs ieee:aiee vs ieee:iso).

Ping @andrew2net

andrew2net commented 2 years ago

The only PubID schema spec I know about is https://github.com/metanorma/nist-pubid

strogonoff commented 2 years ago

Yes. Different document identifiers have different semantics. My point was that the the docid field should have consistent shared semantics for its contents. Individual identifiers can have meaningful differences, but they must share some core thing in common, otherwise they should not reside under the same property. So what’s that core thing is the question.

On 9 Jan 2022, at 6:40 AM, Ronald Tse @.***> wrote:

Yes. Different document identifiers have different semantics.

ronaldtse commented 2 years ago

So what’s that core thing is the question.

Is the question about a particular document identifier scheme ("pubid scheme")? Every pubid scheme has different semantics.

strogonoff commented 2 years ago

Right now I am fine with the primary attribute at least helping distinguish between the two kinds of identifiers, even if we mix concerns under docid.

I will hide docid.type for primary identifiers in the UI, and show them as citeable.

We can probably close this issue.

strogonoff commented 2 years ago

@ronaldtse no, it was about different uses for different elements. Right now, marked as primary identifiers are citeable (maybe we should have called them that), and others are not, so that works for me.

strogonoff commented 2 years ago

Until now, we had a mix with no way for me (or another generic consumer) to tell which is which except by maintaining a hard-coded list of types.

ronaldtse commented 2 years ago

I see what you mean by "citable" now -- an identifier that people can cite a document with. Maybe we should adopt this notion for the "primary identifier"!

strogonoff commented 2 years ago

I believe that was what we agreed on with Nick during the latest internal chat about document identifiers. Not sure if you had to drop off before that part…

ronaldtse commented 2 years ago

@strogonoff I probably already dropped off, but this is good and we should document this...

strogonoff commented 2 years ago

Nick said that no one cites by DOI or ISBN, so they will not be primary. (And my question would be shouldn’t we keep them under another attribute entirely, but at least the primary flag helps.)

strogonoff commented 2 years ago

I am bundling a mini-spec to IETF BibXML service docs, but want to create one separately as well. But ideally that one should be generated from LutaML.

ronaldtse commented 2 years ago

Agree with Nick's comment in rendering (but it is helpful for people to "locate/fetch a citation using DOI/ISBN"). Maybe another attribute could be used but there is an advantage of keeping all the "identifiers" under the same array.

Re: mini-spec, agree.

strogonoff commented 2 years ago

I will keep all identifiers searchable, but show only primary as citeable.

Maybe searchability can be their shared value, but it seems wrong as data-level concern, any app can choose to search by whatever attributes.

Anyway, I am fine with this degree of denormalizarion, since we agreed to clearly signify at least which is which with the primary marker.

ietf-tools / relaton-data-ieee

Document type repeats in document ID #5