metanorma / pubid-bsi

BSI Publication Identifiers
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Fails to parse publication identifiers of "combined documents" ("PAS 2035/2030:2019") #39

Closed andrew2net closed 1 year ago

andrew2net commented 1 year ago

BSI has documents like PAS 2035/2030:2019+A1:2022 and this gem fails to parse their identifiers:

Pubid::Bsi::Identifier.parse "PAS 2035/2030:2019+A1:2022"
Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
cause: Failed to match sequence (stage:'Fpr'? 'WD/'? (type:GUIDE_PREFIX SPACE)? (stage:STAGE SPACE)? (stage:TYPED_STAGE SPACE)? (ORIGINATOR (SPACE / '/'))? (TC_DOCUMENT_BODY / STD_DOCUMENT_BODY / DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?)) at line 1 char 1.
`- Expected one of [TC_DOCUMENT_BODY, STD_DOCUMENT_BODY, DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?] at line 1 char 1.
   |- Failed to match sequence ((tctype:TCTYPE '/'?){0, } SPACE tcnumber:DIGITS ('/' ((sctype:SCTYPE SPACE scnumber:DIGITS '/')? wgtype:WGTYPE SPACE wgnumber:DIGITS / sctype:SCTYPE (SPACE / '/' wgtype:WGTYPE SPACE) scnumber:DIGITS))? SPACE 'N' SPACE? number:DIGITS) at line 1 char 1.
   |  `- Expected " ", but got "P" at line 1 char 1.
   |- Failed to match sequence ((TYPE / stage:STAGE)? SPACE? ((stage:STAGE / stage:TYPED_STAGE / TYPE) SPACE)? number:DIGITS ('|' joint_document:(publisher:'IDF' SPACE number:DIGITS))? PART? ITERATION? (SPACE? (':' / '-') YEAR)? SUPPLEMENT? ADDENDUM? EDITION? LANGUAGE?) at line 1 char 9.
   |  `- Extra input after last repetition at line 1 char 9.
   |     `- Failed to match sequence ('(' language:(([a-z]{1, } ','? / ('E' / 'F' / 'A' / 'R') '/'?){0, }) ')') at line 1 char 9.
   |        `- Expected "(", but got "/" at line 1 char 9.
   `- Failed to match sequence (DIR_DOCUMENT_BODY (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY))?) at line 1 char 1.
      `- Extra input after last repetition at line 1 char 1.
         `- Failed to match sequence (' + ' dir_joint_document:(ORIGINATOR SPACE DIR_DOCUMENT_BODY)) at line 1 char 1.
            `- Expected " + ", but got "PAS" at line 1 char 1.

Blocks this task https://github.com/relaton/relaton-bsi/issues/24

ronaldtse commented 1 year ago

This is one tricky issue.

@mico please help see if it's possible to have a "dual number" document.