metanorma / pubid-iso

Implementation of ISO pubid
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

PubID creation for Stages and Amendments #102

Closed ronaldtse closed 1 year ago

ronaldtse commented 2 years ago

These use cases have to work. All of them don't work -- I am merely typing out the expected output.

Case 1

# Generates Amd pubid despite not providing Amendment Class as input
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "NP"}]).to_s
=> "ISO 17301-1:2016/NP Amd 1(en)"

# Generates Amd pubid with year despite not providing Amendment Class as input
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "NP", year: 2017}]).to_s
=> "ISO 17301-1:2016/NP Amd 1:2017(en)"

Case 2

# Generates "DAmd"
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "DIS"}]).to_s
=> "ISO 17301-1:2016/DAmd 1(en)"

Case 3

# Amd at IS generates pubid
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "IS"}]).to_s
=> "ISO 17301-1:2016/Amd 1(en)"

# Amd at IS generates URN
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "IS"}]).urn
=> "urn:iso:std:iso:17301:-1:stage-60.00:amd:1:en"

# Amd with IS but no year
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1, stage: "IS"}]).urn
=> "urn:iso:std:iso:17301:-1:ed-1:amd:1:en"

# Amd with IS and year
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1, stage: "IS", year: 2017}]).urn
=> "urn:iso:std:iso:17301:-1:ed-1:amd:1:en"

# Amd with IS, year and edition
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1, stage: "IS", year: 2017, edition: 1}]).urn
=> "urn:iso:std:iso:17301:-1:ed-1:amd:1:en"

# Amd without edition, generates pubid
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", year: 2016, amendments: [{number: 1, year: 2017}]).to_s
=> "ISO 17301-1:2016/Amd 1:2017"

# Amd with edition, generates pubid
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1, year: 2017}]).urn
=> "ISO 17301-1:2016/Amd 1:2017"

# URN succeeds when Amd is created with edition of base document
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1}]).urn
=> "urn:iso:std:iso:17301:-1:ed-1:amd:1:en"

# URN fails when Amd is created without edition of base document
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1}]).urn
=> raise Error because base document must have edition

Case 4

> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "IS", iteration: 2}])
=> raise Error because IS stage does not support iterations.

Case 5

# Generates DTS
> Pubid::Iso::Identifier.new(number: 17301, part: "1-1", publisher: "ISO", language: "en", year: 2016, type: "TS", stage: "DIS", copublisher: ["IEC","IEEE"]).to_s
=> "ISO/IEC/IEEE DTS 17301-1-1:2016(en)"

Case 6

# Generates FDTS
> Pubid::Iso::Identifier.new(number: 17301, part: "1-1", publisher: "ISO", language: "en", year: 2016, type: "TS", stage: "FDIS", copublisher: ["IEC","IEEE"]).to_s
=> "ISO/IEC/IEEE FDTS 17301-1-1:2016(en)"

Task list

ronaldtse commented 2 years ago

Please add these cases to the RSpec tests.

mico commented 2 years ago
# URN fails when Amd is created without edition of base document
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1}]).urn
=> raise Error because base document must have edition

"year" is an edition here, it will generate "ISO 17301-1:2016" as a base document id.

ronaldtse commented 2 years ago

@mico what I mean is, without an edition, there cannot be an amendment URN. When there is no edition number, it is possible to create a PubID in string. However, RFC 5141 URNs for amendments require existence of an edition number. Therefore an error should be raised.

mico commented 2 years ago
# Generates "DAmd"
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, amendments: [{number: 1, stage: "DIS"}]).to_s
=> "ISO 17301-1:2016/DAmd 1(en)"

Already have test for that https://github.com/metanorma/pubid-iso/blob/main/spec/pubid_iso/identifier_spec.rb#L868

mico commented 2 years ago

@ronaldtse We have PubID identifiers with edition included in PubID, examples:

Now we also include edition to generated PubID when we have one of these identifiers as input. Should we keep this behaviour? If yes, it's conflicting with this rule:

# Amd with edition, generates pubid
> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", language: "en", year: 2016, edition: 1, amendments: [{number: 1, year: 2017}]).urn
=> "ISO 17301-1:2016/Amd 1:2017"

Where edition is not included in PubID output but included to URN.

ronaldtse commented 2 years ago

Good point @mico. Can we provide a PubID rendering option to whether show Edition or not?

opoudjis commented 2 years ago
Pubid::Iso::Errors::SupplementWithoutYearError:
  Cannot apply supplement to document without edition year

We have been generating undated identifiers for all ISO documents, including amendments. So both ISO 123:2001 and ISO 123, and therefore both ISO 123:2001/Amd 1:2002 and ISO 123/Amd 1.

However, I don't think you are wrong in raising this error. ISO 123 is much more meaningful than ISO 123/Amd 1 is: ISO 123 refers to the set of all editions of ISO 123; ISO 123/Amd 1 is much less well-defined.

So for now, I'm going to not generate undated identifiers for amendments. @ronaldtse @Intelligent2013 let me know if this raises concerns.

ronaldtse commented 2 years ago

So for now, I'm going to not generate undated identifiers for amendments.

Yes this error is correct indeed.

We are now disallowing the PubID "ISO 123/Amd 1" because a supplement must be based on a base publication with either edition of year.

We should not generate undated identifiers for amendments.

mico commented 2 years ago

Good point @mico. Can we provide a PubID rendering option to whether show Edition or not?

@ronaldtse we already have this option (with_edition):

> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", year: 2016, amendments: [{number: 1, year: 2017}]).to_s(with_edition: false)
=> "ISO 17301-1:2016/Amd 1:2017"

but without this option, result will be:

> Pubid::Iso::Identifier.new(number: 17301, part: 1, publisher: "ISO", year: 2016, amendments: [{number: 1, year: 2017}]).to_s
=> "ISO 17301-1:2016/Amd 1:2017 ED1"

Is it enough for now or we should change something?

opoudjis commented 2 years ago

Lots of minor discrepancies between what I'd implemented and the current code, now that I have the code working. Posting to confirm:

Before: ISO 17301-1:2016/PreNP Amd 1. After: ISO 17301-1:2016/NP Amd 1.3:2017

Before ISO 17301-1:2030/CD Cor.3. After: ISO 17301-1:2030/CD Cor 3

Before: ISO/PreWD3 1000-1. After: ISO WD 1000-1.3

Before: IEC/IETF/ISO/TR 1000-1-1:2001. After: IEC/IETF/ISO TR 1000-1-1:2001

Before: ISO 1000. After: ISO PRF 1000

ronaldtse commented 2 years ago

Before: ISO 17301-1:2016/PreNP Amd 1. After: ISO 17301-1:2016/NP Amd 1.3:2017

  • We are ignoring draft status (Pre)

This is a question to be discussed in #76

we are not ignoring iterations in amendments.

This is correct.

Before ISO 17301-1:2030/CD Cor.3. After: ISO 17301-1:2030/CD Cor 3

  • We are not abbreviating Corrigendum as "Cor.", we are putting space between "Cor" and iteration

Correct. "Cor." is legacy practice that has been replaced with "Cor" (no period).

Before: ISO/PreWD3 1000-1. After: ISO WD 1000-1.3

  • We are appending iterations, not making them part of the status.

Correct. Iteration is after the document/part number prefixed with the full stop.

We are not delimiting ISO from status with a slash.

This is wrong. It should have been "ISO/WD 1000-1.3".

Before: IEC/IETF/ISO/TR 1000-1-1:2001. After: IEC/IETF/ISO TR 1000-1-1:2001

  • We are not delimiting publishers from document type with a slash.

This is correct. The previous IEC/IETF/ISO/TR pattern is wrong. If there are copublishers, copublishers are separated by slashes, and the document type appears after the copublishers and a space. If there are no copublishers, the document type is placed after a slash after "ISO".

Before: ISO 1000. After: ISO PRF 1000

  • We are not ignoring stage PRF in identifier rendering.

This is usage dependent:

This means we need to have some option of specifying behavior here.

opoudjis commented 2 years ago

I would still rather we be able to specify stage as numbers rather than codes; as it is, the stage parameters are abstract anyway (we specify "FDIS" to get "FDTS" for tech specs, so we might as well just specify "50".)

OK, a couple of things to fix still, but this is substantially improved.

ronaldtse commented 2 years ago

@ronaldtse we already have this option (with_edition):

Can we reverse the default behavior i.e. with_edition: false by default, only render edition when with_edition: true? Thanks!

ronaldtse commented 2 years ago

I would still rather we be able to specify stage as numbers rather than codes; as it is, the stage parameters are abstract anyway (we specify "FDIS" to get "FDTS" for tech specs, so we might as well just specify "50".)

I agree that we should also allow specifying stages via stage codes (and substage codes).

When using it though understand that there will be inconsistencies: stage codes for PRF and FDIS are identical, so anything 50.00 will be rendered by default as FDIS unless you specify the PRF stage. (@mico can you put this in the README? Thanks.)

opoudjis commented 2 years ago

Because the pubid-iso gem is now going to act as my oracle for stage abbreviations (why should the lookup tables for abbreviations be implemented twice?),

I will need to be able to look up the actual abbreviation for a stage, as applied to the particular document type. So if the style FDIS is applied to a TS document, I need a function that will return me "FDTS" given the initialised identifier. id.stage.abbr still returns me "FDIS". I request it be updated to return FTDS. Similarly, I'd like for it to return "IS" for 60.60, even though we suppress display of the "IS" stage identifier.

This will allow me to remove my own mapping to stage abbreviations, so that there is only one source of truth for abbreviations.

opoudjis commented 2 years ago

Pubid::Iso::Identifier.new(**{:number=>"1000", :language=>"en", :type=>"DIR", :publisher=>"ISO", :copublisher=>[], :urn_stage=>"60.60"}).to_s(with_language_code: :single)

outputs

"ISO DIR 1000"

and not the expected

"ISO DIR 1000(E)"

opoudjis commented 2 years ago

We are not delimiting ISO from status with a slash.

This is wrong. It should have been "ISO/WD 1000-1.3".

But note that this entire venture was triggered by https://github.com/metanorma/metanorma-iso/issues/657 :

In ISO, the document identifier pattern for "copublished" documents is "IEC/ISO {stage}" not "IEC/ISO/{stage}", e.g. https://www.iso.org/standard/56572.html.

mico commented 1 year ago

Pubid::Iso::Identifier.new(**{:number=>"1000", :language=>"en", :type=>"DIR", :publisher=>"ISO", :copublisher=>[], :urn_stage=>"60.60"}).to_s(with_language_code: :single)

outputs

"ISO DIR 1000"

and not the expected

"ISO DIR 1000(E)"

We don't have DIR documents with language code. (https://github.com/metanorma/pubid-iso/blob/main/spec/fixtures/iso-pubid-directives.txt) Should we support it? @ronaldtse

ronaldtse commented 1 year ago

But note that this entire venture was triggered by metanorma/metanorma-iso#657 :

In ISO, the document identifier pattern for "copublished" documents is "IEC/ISO {stage}" not "IEC/ISO/{stage}", e.g. https://www.iso.org/standard/56572.html.

See this, this is correct behavior:

This is correct. The previous IEC/IETF/ISO/TR pattern is wrong. If there are copublishers, copublishers are separated by slashes, and the document type appears after the copublishers and a space. If there are no copublishers, the document type is placed after a slash after "ISO".

We don't have DIR documents with language code.

@mico sure, let's support DIR with language codes as well. Thanks.

ronaldtse commented 1 year ago

I will need to be able to look up the actual abbreviation for a stage, as applied to the particular document type. So if the style FDIS is applied to a TS document, I need a function that will return me "FDTS" given the initialised identifier. id.stage.abbr still returns me "FDIS". I request it be updated to return FTDS. Similarly, I'd like for it to return "IS" for 60.60, even though we suppress display of the "IS" stage identifier.

@opoudjis I need a bit more context here. The return value requested is inconsistent in these two cases:

Maybe the "doctype + stage" abbreviation can be called the "document number prefix", but notice that there is a challenge:

What should these return? Unless you include the "ISO..." bit the values will be inconsistent.

opoudjis commented 1 year ago

At the moment, these are being shoved into stage/@abbreviation in markup, and they are being used by @Intelligent2013 in the process of creating the cover page for PDF. I was indeed giving

ISO/TR => "TR" ISO/FDTR => "FDTR" ISO/IEC TR => "TR" ISO/IEC FDTR => "FDTR"

.... I don't see that consistency is that important. I think it is far more important that we not destroy a PDF workflow that has been stable for the past two years out of pedantry. I also don't particularly see where else this information should be stored: it is a stage abbreviation, albeit specific to a document type. So?

If your concern is what pubid-iso calls this, as opposed to what Metanorma XML does with it: what pubid-iso calls this data element of course is up to pubid-iso, as long as I can obtain it to pass it on. You can call it type_stage_abbrev if you like.

The inconsistency just means it is specific to the type/stage combination, but that does not impact anything in identifier generation anyway. I just want the string, so that Alex can put the string in the PDF front cover.

ronaldtse commented 1 year ago

ISO/TR => "TR" ISO/FDTR => "FDTR" ISO/IEC TR => "TR" ISO/IEC FDTR => "FDTR" I just want the string, so that Alex can put the string in the PDF front cover.

This inconsistency is a real issue because it impacts @Intelligent2013 as well. The PDF workflow should not need to manually reconstruct these prefixes.

What I am stating here is: the publisher, document type and stage abbreviations are LINKED and dependent on each other. They need to be represented as one string.

Ping @Intelligent2013 on how this works for you?

Intelligent2013 commented 1 year ago

I don't clear understand the changes. I.e. instead of <stage abbreviation="TR" language="" will be <stage abbreviation="ISO/TR" language="" or <stage abbreviation="ISO/IEC TR" language=""? Then I can extract TR, no problem. Currently, stage/@abbreviation is using to determination the PDF layout for cover page (there are 4 layouts) and some initial pages.

Intelligent2013 commented 1 year ago

The text representation of abbreviation XSLT reads from stage with non-empty @language:

<stage abbreviation="DIS" language="en">Draft International Standard</stage>
opoudjis commented 1 year ago

This inconsistency is a real issue because it impacts @Intelligent2013 as well. The PDF workflow should not need to manually reconstruct these prefixes.

You are completely misunderstanding the requirement, @ronaldtse. This is nothing to do with document identifiers. I am providing the stage abbreviation, for him (and me) to display on the coverpage for unpublished documents: IT SAYS, on the cover, "FDTR stage" in the warning box. That string is unlinked to the publisher and the document type: this is not the document identifier. I am asking for this gem to provide me that abbreviation, so I don't have to reconstruct it myself any more, and maintain a separate source of truth on those type-specific stage abbreviations.

opoudjis commented 1 year ago

I still need the stage output. Moreover, whatever the incompatibility is between pubid-iso and relaton-iso needs to be resolved. https://github.com/relaton/relaton-iso/commit/5b6a684a8704eb48fbf5cc38824b9bcfb42e641b @andrew2net

This is now holding up tomorrow's release.

The stage output, I can wait on. The incompatibility between pubid-iso and relaton-iso, I cannot.

ronaldtse commented 1 year ago

@opoudjis thanks for the explanation. We could provide the "FDTR" abbreviation and full name for a pubid. The same happens for TR, PAS.

How about pubid.typed_stage_abbev for "FDTR" and pubid.typed_stage_name for "Final Draft Technical Report"?

ronaldtse commented 1 year ago

@mico can we implement this soon? thanks.

ronaldtse commented 1 year ago

The remaining issues have been dealt with by using the TypedStages concept.

Once pubid-iso is released this can be used by relaton-iso.

ronaldtse commented 1 year ago

Closing this.