metanorma / pubid-ieee

PubID spec and implementation for IEEE deliverables
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Year is parsed as part number #73

Open andrew2net opened 1 year ago

andrew2net commented 1 year ago

Parsing IDs like IEEE 1234-1991 or IEEE 1234-2021 is incorrect

Pubid::Ieee::Identifier.parse "IEEE 1234-1991"
=> #<Pubid::Ieee::Identifier:0x00007fa9cc15cdb8
 @iso_identifier=#<Pubid::Iso::Identifier:0x00007fa9cc157390 @number="1234", @part="1991", @publisher="IEEE">,
 @number=nil,
 @proposal=false,
 @publisher="IEEE",
 @revision=nil>

Pubid::Ieee::Identifier.parse "IEEE 1234-2021"
=> #<Pubid::Ieee::Identifier:0x00007fa9de04b408
 @iso_identifier=#<Pubid::Iso::Identifier:0x00007fa9db197d50 @number="1234", @part="2021", @publisher="IEEE">,
 @number=nil,
 @proposal=false,
 @publisher="IEEE",
 @revision=nil>

19xx or 20xx should be treated as a year not as partnumber

ronaldtse commented 1 year ago

This task is blocking https://github.com/ietf-tools/bibxml-service/issues/18

ronaldtse commented 1 year ago

As @andrew2net explained, let's treat 19xx and 20xx as years. The IEEE pubid scheme cannot cope with this...

mico commented 1 year ago

As @andrew2net explained, let's treat 19xx and 20xx as years. The IEEE pubid scheme cannot cope with this...

What about year presentation in PubID? Normally, it should be something like "IEEE 1234:1991", but we don't have identifiers with only number without part. So for this case, (no part numbers) I can represent year as a part ("IEEE 1234-1991") but it brings some mess in the identifiers standard.

Upd. I just looked into the code, seems "IEEE {number}-{optional-part}-{year}" is the standard for IEEE. Also for identifiers like "IEEE No 1234-1991" year is parsing correctly now. Only when you trying to access year it returns version for rendering with "-" as prefix:

0> Pubid::Ieee::Identifier.parse("IEEE No 1234-1996")
=> #<Pubid::Ieee::Identifier:0x000000010cab92f0 @number="1234"@8, @proposal=false, @revision=nil, @publisher="IEEE", @year="1996">

0> Pubid::Ieee::Identifier.parse("IEEE No 1234-1996").year
=> "-1996"

Another observation is when you don't use prefix like "No" or "Std" before number it recognizes identifier as ISO identifier and parse it with pubid-iso, so then year getting parsed as part:

0> Pubid::Ieee::Identifier.parse("IEEE 1234-1996")
=> #<Pubid::Ieee::Identifier:0x000000010cd1a850 @number=nil, @proposal=false, @revision=nil, @publisher="IEEE", @iso_identifier=#<Pubid::Iso::Identifier:0x000000010cd19518 @publisher="IEEE", @number="1234", @part="1996">>

But have some IEEE identifiers without "No" or "Std" so it's parsed incorrectly (because they are parsed by pubid-iso) e.g.:

IEEE 1070-1995
IEEE 278-1967
IEEE 730-1989
IEEE 751-1991

But these identifiers parsed correctly:

IEEE 2030.101-2018
IEEE C57.12.26-1992
IEEE C57.121-1988
IEEE C62.31-1987
IEEE C62.92.1-1987

I believe solution for identifiers like "IEEE 1070-1995" could be implementation of parsing the last part as a year in pubid-iso because this identifiers represented in ISO PubID standard.

mico commented 1 year ago

Ok, "IEEE 1070-1995" is a mixed standard (not ISO nor IEEE PubID). For IEEE PubID we missed "No" or "Std" here. For ISO year should be separated with ":" (IEEE 1070:1995) There are no so many identifiers like this:

IEEE 1070-1995
IEEE 278-1967
IEEE 730-1989
IEEE 751-1991
IEEE 1023-2020 (Revision of IEEE Std 1023-2004)

So I can just add a replacement for these identifiers.

mico commented 1 year ago

https://github.com/metanorma/pubid-ieee/blob/main/spec/fixtures/pubid-parsed.txt @ronaldtse @andrew2net is this list complete? My observations were based on this list.

mico commented 1 year ago

From Skype with @ronaldtse:

But IEEE 1070-1995 is an IEEE format Exactly as you described. Technically the correct format is “IEEE Std 1070-1995”. But ieee often omits the Std keyword

It’s hard to distinguish, only year is not ISO style

But it starts with publisher of ieee So it is preferred to use the ieee parsing rules

I could be very tricky to distinguish them for the parser, for example there are identifiers like: IEC/IEEE 60214-2:2019 it has IEEE, but it is ISO format. I'll check if it's complicated or not. Parser there are already overloaded and nearly manageable limit.

Well technically this is the "IEC" format, but ISO and IEC do share a core

ronaldtse commented 1 year ago

"IEEE 1070-1995" is a mixed standard (not ISO nor IEEE PubID). For IEEE PubID we missed "No" or "Std" here. For ISO year should be separated with ":" (IEEE 1070:1995) There are no so many identifiers like this:

Instead of consider these identifiers as "ISO style", I would consider "IEEE nnnn-yyyy" as "IEEE style" identifiers, for two reasons:

  1. This identifier starts with "IEEE". Then this should really be an IEEE style document.
  2. ISO/IEC style identifiers always have the :yyyy edition year prefix, if the year is given. IEEE always uses -yyyy.

Let's consider this:

I would rather manually correct the IEEE-published identifiers in ISO-style into proper IEEE-style, e.g. these ones:

IEEE Std 24748-3:2012                 => IEEE Std 24748-3-2012
IEC/IEEE P62271-37-013:2015 D13.4     => IEC/IEEE P62271-37-013-2015/D13.4