metanorma / pubid-ieee

PubID spec and implementation for IEEE deliverables
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

List of IEEE publication `normtitle` #3

Closed ronaldtse closed 2 years ago

ronaldtse commented 3 years ago

Full list. A lot cleaner than stdnumber.

UPDATED: pubid-sorted.txt.zip

ronaldtse commented 3 years ago

We should parse this list instead of #2 .

Especially for the dual/triple published standards, only the normtitle provides that information.

andrew2net commented 3 years ago

RegEx expression to parse the normtitles https://regex101.com/r/yz98W9/4

andrew2net commented 3 years ago

@ronaldtse

ronaldtse commented 3 years ago
  • These 2 normtitles look identical:
EEE Std 1671.1-2017 (Revision of IEEE Std 1671.1‐2009)
EEE Std 1671.1-2017 (Revision of IEEE Std 1671.1-2009)

These seem to be two identical documents, and notice that they have a typo: "EEE" should have been "IEEE". Let's correct them as special cases.

  • There are 2 normtitles IEEE Std P1671/D5, June 2006 and IEEE Std P1671/D5, Jun 2006. They associate 2 distinct documents. The only difference is that one has a full month name and the other short. I think we need to make them be various in another way.

The first is 04067148.xml, the second is 041524522.xml.

The first one is broken because it has this:

    <standard_id>0</standard_id>

The second one has this:

    <standard_id>3721</standard_id>

I think we want to drop the first one.

  • And there are 2 IEEE Std PC37.100.1/D8, Dec 2006 and IEEE Std PC37.100.1/D8, Dec2006. They difference is break space between month and year.

Similarly, the first one is 04141261.xml, the second is 04152567.xml.

The first one is broken because it has this:

    <standard_id>0</standard_id>

The second one has this:

    <standard_id>4169</standard_id>

Let's also get rid of the first one. We should drop all items with <standard_id> value == 0 (and please document this).

  • The ANSI/IEEE Std identifier doesn't have a number. How should we handle it?

Which one? Can you be more specific? Many documents start with ANSI/IEEE Std.

  • Should the EEE Std 488.2-1992 be IEEE Std 488.2-1992?

Yes. We should replace all /^EEE\s/ with IEEE\

  • Should a year in IEEE 1076-CONC-I99O be 1990, not I990?

Yes. this is clearly a typo. We should fix this in a "cleaning" step.


Given the data errors I think we should first run a separate "cleaning stage" and then parse the identifiers.

andrew2net commented 3 years ago
  • The ANSI/IEEE Std identifier doesn't have a number. How should we handle it?

Which one? Can you be more specific? Many documents start with ANSI/IEEE Std.

Yes, many start with it but only one ends:

ANSI/IEEE PC63.7/D rev17, December 2014
ANSI/IEEE STD 185-1975 (Revision of IEEE Std 165-1947)
ANSI/IEEE Std
ANSI/IEEE Std 1-1986
ANSI/IEEE Std 1000-1987
ronaldtse commented 2 years ago

Done for the moment. Going to create new task.