metanorma / pubid-ieee

PubID spec and implementation for IEEE deliverables
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Decide and implement numbers format for identifiers (dash vs dots) #45

Closed mico closed 2 years ago

mico commented 2 years ago

I had an impression that most of IEEE identifiers have format: IEEE {number}.{part}-{year}

ANSI N42.35-2016
IEEE Std C37.20.1-2015
IEEE Std 1900.1-2019
IEEE Std P802.11a/D7

or IEEE {number}-{year} IEEE Std 515-2011

As I can see now, it's now always like that:

IEEE Std 11073-10420-2020
IEEE P11073-10101a/D1, April 2015
IEC/IEEE P60076-57-129_D26

For IEEE document numbers starting with "P" when it's 4 digits after it's always using "." before next number IEEE P1666.1/D4, October 2015 Only exception when it's year IEEE P2410-2020/D5, December 2020 For "P" and 5 digits, it's always using "-" exception only for a few identifiers:

IEEE P15288.1/D3.0, June 2014
IEEE P15288.1/D4.0, August 2014
IEEE P15288.1/D4.1, September 2014
IEEE P15288.2/D5.1, August 2014
IEEE P15288.2/D5.2, September 2014

I suspect most of the cases when number and part using "-" between it's coming from ISO format.

It's not complete research, I just pick random identifiers and investigate particularly numbers with "P" prefix.

_Originally posted by @mico in https://github.com/metanorma/pubid-ieee/pull/44#discussion_r835093748_

ronaldtse commented 2 years ago

I have a few points to add here:

  1. I think we need to differentiate the numbering styles:

These:

ANSI N42.35-2016
IEEE Std C37.20.1-2015

Are "ANSI-style" codes. they go by "Lnn.nn-yyyy". We have seen some of them in the NIST publications too.

  1. In IEEE, the pattern "nnnn.n" is used for "working groups" within committees.

For example:

A standard number can re-use the name of the working group.

For example:

Also, notice that this standard is not a Part standard:

Screenshot 2022-03-26 at 12 58 58 PM
  1. It is unclear whether a "hyphen" or a "dot" indicates a part, and sometimes the PubID doesn't even indicate it is a part!

For example,

In the modelling and parsing, perhaps we should just identify what is a "standards identifier" without splitting parts. It is clear that we cannot know what is a part vs not a part just from the PubID.

In any case, the last -yyyy is definitely year, so at least we can extract that...

mico commented 2 years ago

In the modelling and parsing, perhaps we should just identify what is a "standards identifier" without splitting parts. It is clear that we cannot know what is a part vs not a part just from the PubID.

You suggesting to parse numbers like "PC62.42.5" or "P15026-2" as a whole, without splitting them, exception only for years (-yyyy), right?

ronaldtse commented 2 years ago

Yes. What do you think?

mico commented 2 years ago

Yes. What do you think?

Ok, let's see how it will work.

mico commented 2 years ago

@ronaldtse IEEE 15026-2-2011 (https://standards.ieee.org/ieee/15026-2/5234/) but IEEE Std 1244-5.2000 (https://ieeexplore.ieee.org/document/882673) Should we keep the same delimiter between year and number, or use "-" or "." for all identifiers with year?

ronaldtse commented 2 years ago

@mico Out of 6,187 entries (matching -\d\d\d\d[\s\n\)]), there are only 4 instances of \.\d\d\d\d:

IEEE Std 581.1978
IEEE Std 1244-5.2000
ANSI/IEEE C37.30.1971
ANSI/IEEE Std C37.26.1972

We should assume they are mistakes. Let's use -. Thanks!