metanorma / pubid-iso

Implementation of ISO pubid
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

URGENT! Parsing some new IDs fails #240

Closed andrew2net closed 7 months ago

andrew2net commented 8 months ago

We have new identifiers:

ISO/IEC JTC 1 DIR ISO/IEC JTC 1 Directives. This is the undated reference. Notice that these are "internal directives" of ISO/IEC JTC 1. This series existed until 2007, after which it became "ISO/IEC Directives — JTC 1 Supplement".

> Pubid::Iso::Identifier.parse('ISO/IEC JTC 1 DIR')
Pubid::Core::Errors::ParseError: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 9.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 15.
   |  `- Expected "N", but got "D" at line 1 char 15.
   |- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 9.
   |  `- Expected at least 1 of \\d at line 1 char 9.
   |     `- Failed to match \\d at line 1 char 9.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 14.
      `- Extra input after last repetition at line 1 char 14.
         `- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 14.
            `- Expected " + ", but got " DI" at line 1 char 14.

ISO/IEC JTC 1 DIR:{yyyy} Available edition years are: 2004 (5th Edition), 2005 (5th Edition, Version 1.0), 2006 (5th Edition, Version 2.0), 2007 (5th Edition, Version 3.0).

> Pubid::Iso::Identifier.parse('ISO/IEC JTC 1 DIR:2004')
Pubid::Core::Errors::ParseError: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 9.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 15.
   |  `- Expected "N", but got "D" at line 1 char 15.
   |- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 9.
   |  `- Expected at least 1 of \\d at line 1 char 9.
   |     `- Failed to match \\d at line 1 char 9.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 14.
      `- Extra input after last repetition at line 1 char 14.
         `- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 14.
            `- Expected " + ", but got " DI" at line 1 char 14.

ISO/IEC DIR 1 + IEC SUP:2016-05

> Pubid::Iso::Identifier.parse('ISO/IEC DIR 1 + IEC SUP:2016-05')
Pubid::Core::Errors::ParseError: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 9.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 9.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 9.
   |  `- Expected " ", but got "D" at line 1 char 9.
   |- Failed to match sequence ((TYPE / stage:STAGE iteration:DIGITS?)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / DASH) YEAR)? SUPPLEMENT? EXTRACT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 14.
   |  `- Extra input after last repetition at line 1 char 14.
   |     `- Failed to match sequence ('(' language:(([a-z]{1, } ','? / ('E' / 'F' / 'A' / 'R') '/'?){0, }) ')') at line 1 char 14.
   |        `- Expected "(", but got " " at line 1 char 14.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 14.
      `- Don't know what to do with "-05" at line 1 char 29

Blocks this https://github.com/relaton/relaton-iso/issues/160

mico commented 8 months ago

ISO/IEC JTC 1 DIR

@andrew2net Should I return "ISO/IEC DIR 1 JTC 1 SUP" after parsing this one?

ISO/IEC JTC 1 DIR:{yyyy}

Should it become "ISO/IEC DIR JTC 1 SUP:{yyyy}"

andrew2net commented 8 months ago

ISO/IEC JTC 1 DIR

@andrew2net Should I return "ISO/IEC DIR 1 JTC 1 SUP" after parsing this one?

No. We have ISO/IEC JTC 1 DIR:{yyyy} docs but reference can be without year ISO/IEC JTC 1 DIR. In case reference without year is used it should match to documents' ID with any year. We need to select docs that match the reference but ignore year when the reference has no year. I don't see ISO/IEC DIR 1 JTC 1 SUP in our dataset. Do you mean ISO/IEC DIR JTC 1 SUP? ISO/IEC JTC 1 DIR:{year} and ISO/IEC DIR JTC 1 SUP are different documents.

ISO/IEC JTC 1 DIR:{yyyy}

Should it become "ISO/IEC DIR JTC 1 SUP:{yyyy}"

No, they are different docs.

mico commented 7 months ago

@ronaldtse why we have "ISO/IEC DIR JTC 1 SUP" but "ISO/IEC JTC 1 DIR"? "JTC" and "DIR" have different order for these identifiers. Should I keep it as original?

ronaldtse commented 7 months ago

in history:

These are completely different documents, though the second supersedes the first.

andrew2net commented 7 months ago

@mico what is the progress with this issue?