metanorma / pubid-ieee

PubID spec and implementation for IEEE deliverables
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

= IEEE publication identifiers ("IEEE PubID")

== Purpose

Implements a mechanism to parse and utilize IEEE publication identifers.

== Historic identifier patterns

There are at least two major "pattern series" of identifiers due to historical reasons: old (type I) and new (type II). This implementation attempts to support both types of publication identifier patterns.

== Use cases to support

== Elements of the PubID

=== Publisher

|=== | Name | Abbrev

| Institute of Electrical and Electronics Engineers | IEEE

|===

=== Report number

{number} - is a set of one or more digits and optional letters

=== Part

{part} - is a set of digits and optional letters; starts with a digit; if a letter or letters are present then they are in the end; optional

=== Subpart

{subpart} - is a set of digits and optional letters; optional, many subparts are possible

=== Year

{year} - is a set of 4 digits; optional

=== Corrigendum & Amendment

{cor} - is a corrigendum or an amendments with the pattern Cor {cornum}-{year} or Amd {cornum}:{year} where {cornum} is a set of digits; optional

== Type I pattern

[source]

{publisher} {type} {series} {number}{part}.{subpart}{year} {edition}/{conform}/{correction}

(*) - optional

An identifier can be composed of 2 other identifiers with breakspace delimiter. Only the first identifier needs to cnatain puplisher, for the secont it's optional

Following RegEx expression parses 100% of identifiers from the type I https://xml2rfc.tools.ietf.org/public/rfc/bibxml-ieee/[dataset]: [source,regex]


{ ^IEEE\s ((?Standard|Std|Draft(\sStandard|\sSupplement)?)\s)? ((?ISO\/IEC(\/IEEE)?)\s)? (?[A-Z]?\d+[[:alpha:]]?) (.-)? (.(?\d[[:alpha:]]?))? (?([-:]|\s-\s|,\s)\d{4})? (\s(IEEE\s(?Std)\s)?(?[A-Z]?\d+[[:alpha:]]?) (.-)? (.)? (?([-:.]|_-|\s-\s|,\s)\d{4})?)? (\s(?Edition(\s([^)]+))?|First\sedition\s[\d-]+))? (\/(?Conformance\d{2})-(?\d{4}))? (\/(?(Cor\s?|(Amd.)\d{1,2}) (?(:|-|:-)\d{4}))?$ }x

== Pasing PubID elements from type II identifiers

To parse PubID elements from the type II pattern identifiers we can use a RegEx expression:

[source,regex]

{ ^IEEE\s(?\w+(.[A-Z]\d|\sHBK)?) (?(.|\s)\d{1,4}[[:alpha:],]{0,7}|-\d?[A-Z]+|-\d(?=[-.]))? (?.\d{1,3}[a-z]?|-\d{5}[a-z]?|-\d+(?=[-:]))? (?.\d|-\d+(?=-))? (?([-:.]|-|\s-)\d{4})? (\/(?([A-Z]?\d+[a-z]?|Conformance\d+)) ((.|-)(?\d{1,3}[a-z]?)(?!\d))? (.(?\d{1,2}))?)? (\/(?\d+)(.(?\d))?)? (?([-:.]|-|\s-)\d{4})? ((\/||-|\s\/)(?(Cor|(?i)Amd(?-i))(\s|.|.\s)?\d{1,2}) (?(:|-|:-|[A-Z][a-z]{2})\d{4}(-\d{4})?)?)?$ }x

This RegEx expession covers 99% of the identifiers from the type II bibxml-ieee https://xml2rfc.tools.ietf.org/public/rfc/bibxml-ieee-new/[dataset].

== File name generator

For type I identifiers file names are generated by replacing symbols /, \, ,, ', ", (, ), and breakspace with symbol _. Sequences of multiple sybols _ should be squized to one symbol.

For type II identifiers it needs to parse PubID elements than join the elements in order:


IEEE.{number1}{part1}.{subpart11}.{subpart12}-{year1}{number2}{part2}.{subpart21}{number3}{part3}-{year2}{correction}-{coryear}