metanorma / pubid-itu

Parser for ITU-T and ITU-R publication identifiers
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Implement pubid-itu #1

Open ronaldtse opened 2 years ago

ronaldtse commented 1 year ago

As stated by @andrew2net :

Please start with code from this, which is also written using Parsley:

mico commented 1 year ago

@ronaldtse @andrew2net could you help me find a list of ITU identifiers, as complete as possible but something around 100 different identifiers from different categories and different format will be enough to start building parser.

mico commented 1 year ago

@ronaldtse @andrew2net also any documents related to categories, format would be very helpful.

ronaldtse commented 1 year ago

@mico ITU has 3 bureaus that issue different identifiers:

mico commented 1 year ago

@ronaldtse I also found there are document types for each sector (ITU-R, ITU-T, ITU-D) like:

But I don't understand how does it reflect in identifier.

ronaldtse commented 1 year ago

Right now we only care about ITU Recommendations. All 3 bureaus publish "Recommendations" (which are "standards").

mico commented 1 year ago

Right now we only care about ITU Recommendations. All 3 bureaus publish "Recommendations" (which are "standards").

@ronaldtse I found "question" type identifiers here https://github.com/relaton/relaton-data-itu-r/blob/master/data/ITU-R_SG07_110-2.yaml So I should skip identifiers like this for now, right?

ronaldtse commented 1 year ago

@mico actually we should parse all the identifiers from relaton-data-itu-r. The thing is the "Question" identifiers may differ per bureau (we have not fully analyzed the patterns).

andrew2net commented 1 year ago
  • ITU-T: ping @andrew2net if we have them

here are all cases that I have https://github.com/relaton/relaton-itu/blob/main/spec/relaton_itu/pubid_spec.rb

ronaldtse commented 1 year ago

@mico can you please help document the different patterns used for different identifier types? Thanks!

mico commented 1 year ago

@ronaldtse @andrew2net I'm trying to differentiate "question", "recommendation", "handbook" and other types for identifiers. Right now we have:

There are other types like "reports" and "opinions" but I didn't find them in https://github.com/relaton/relaton-data-itu-r "reports" also have similar series as "recommendation" so ITU-R SA.364-6 could be valid identifier for "report" as well.

Handbook don't have series in identifier, but have them on website (https://www.itu.int/pub/R-HDB) very similar to question (SG00, SG01, SG02...), but don't appears in identifiers https://github.com/relaton/relaton-data-itu-r/blob/master/data/ITU-R_63-201.yaml

Upd.: for “resolution” type, identifier looks like ITU-R R.9-6, “R” before numbers means “resolution”, this is the only type where the type is clearly indicated.

How should we deal with that? Should we provide identifier type together with identifier to parse?

ronaldtse commented 1 year ago

ITU PubIDs have the following patterns:

We should definitely separate the identifier classes.