relaton / relaton-iec

IecBib: retrieve IEC/CIE Standards for bibliographic use using the BibliographicItem model
MIT License
1 stars 0 forks source link

Support IEC Harmonised API for projects and publications #45

Closed ronaldtse closed 1 year ago

ronaldtse commented 2 years ago

We have received IEC approval for their Harmonised API to access IEC projects and publications.

https://api-portal.iec.ch/my-apps

@andrew2net has the necessary credentials to test it out before we make it officially available. The goal is to build a relaton-data-iec that we can build by crawling daily.

ronaldtse commented 2 years ago

From IEC:

You can use the publicationDateFrom and publicationDateTo filters to find all publications between two dates.

Here are two examples with curl:

curl --location --request GET 'https://api.iec.ch/harmonized/publications?publicationDateFrom=2022-03-01&page=0&size=100&debug=Y' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer XXXXXXXXXXXXXXXX'
curl --location --request GET 'https://api.iec.ch/harmonized/publications?publicationDateFrom=2022-03-01&publicationDateTo=2022-06-30&page=0&size=100&debug=Y' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer XXXXXXXXXXXXXXXX'

We can use this mechanism to incrementally fetch IEC publication metadata between GHA runs.

andrew2net commented 2 years ago

@ronaldtse currently, the gem uses the following statuses mapping. From the API I got documents with the status REVISED. What stage should we map to the status?

ACD:
  status: Approved for CD
  stage: '20.99'
ACDV:
  status: Approved for CDV
  stage: '30.99'
ADISSB:
  status: Preparation of text subcontracted to CO
  stage: '40.95'
ADTR:
  status: Approved for DTR
  stage: '40.99'
ADTS:
  status: Approved for DTS
  stage: '40.99'
AFDIS:
  status: Approved for FDIS
  stage: '40.99'
AMW:
  status: Document under revision
  stage: '92.20'
ANW:
  status: Registration of new project
  stage: '20.00'
APUB:
  status: Approved for publication
  stage: '50.99'
APUBSB:
  status: Preparation of text subcontracted to CO
  stage: '50.95'
BPUB:
  status: Being published
  stage: '60.00'
BWG:
  status: Return to drafting phase or redefine project
  stage: '30.92'
CAN:
  status: Draft cancelled
  stage: '20.98'
CCDV:
  status: Draft circulated as CDV
  stage: '40.00'
CD:
  status: Draft circulated as CD
  stage: '30.00'
CDISH:
  status: Draft circulated as DISH
  stage: '50.20'
CDM:
  status: CD to be discussed at meeting
  stage: '30.20'
CDPAS:
  status: Draft circulated as DPAS
  stage: '50.20'
CDTR:
  status: Draft circulated as DTR
  stage: '50.20'
CDTS:
  status: Draft circulated as DTS
  stage: '50.20'
CDVM:
  status: Rejected CDV to be discussed at a meeting
  stage: '40.93'
CFDIS:
  status: Draft circulated as FDIS
  stage: '50.20'
DECDISH:
  status: DISH at editing check
  stage: '40.99'
DECFDIS:
  status: FDIS at editing check
  stage: '50.60'
DECPUB:
  status: Publication at editing check
  stage: '60.00'
DEL:
  status: Deleted/abandoned
  stage: '00.99'
DELPUB:
  status: Deleted publication
  stage: '90.99'
DREJ:
  status: Abandon
  stage: '30.98'
DTRM:
  status: Rejected DTR to be discussed at meeting
  stage: '50.92'
DTSM:
  status: Rejected DTS to be discussed at meeting
  stage: '50.92'
MERGED:
  status: Fragment merged
  stage: '30.97'
NADIS:
  status: Repeat enquiry
  stage: '40.93'
NCDV:
  status: CDV rejected
  stage: '40.98'
NDTR:
  status: DTR rejected
  stage: '50.98'
NDTS:
  status: DTS rejected
  stage: '50.98'
NFDIS:
  status: FDIS rejected
  stage: '50.98'
PCC:
  status: Preparation of CC
  stage: '30.92'
PNW:
  status: New work item proposal
  stage: '10.00'
PPUB:
  status: Publication issued
  stage: '60.60'
PRVC:
  status: Preparation of RVC
  stage: '40.92'
PRVD:
  status: Preparation of RVD
  stage: '40.92'
PRVDISH:
  status: Preparation of RVDISH
  stage: '40.92' # ?
PRVDPAS:
  status: Preparation of RVDPAS
  stage: '40.92'
PRVDTR:
  status: Preparation of RVDTR
  stage: '40.92'
PRVDTS:
  status: Preparation of RVDTS
  stage: '40.92'
PRVN:
  status: Preparation of RVN
  stage: '40.92'
PWI:
  status: Preliminary work item
  stage: '00.00'
RDIS:
  status: Registration for formal approval
  stage: '50.00'
RDISH:
  status: DISH received and registered
  stage: '50.00'
RFDIS:
  status: FDIS received and registered
  stage: '50.00'
RPUB:
  status: Publication received and registered
  stage: '60.60'
SPE:
  stage: SPE # ?
SPLIT:
  status: Project Fragmented
  stage: SPLIT # ?
SRP:
  stage: SRP
SUSPENDED:
  status: Project Suspended
  stage: SUSPENDED # ?
TCDV:
  status: Translation of CDV
  stage: '50.00'
TDISH:
  status: Translation of DISH
  stage: '50.00' # ?
TDTR:
  status: Translation of DTR
  stage: '50.00'
TDTS:
  status: Translation of DTS
  stage: '50.00'
TFDIS:
  status: Translation of FDIS
  stage: '50.00'
TPUB:
  status: Translation of publication
  stage: '60.00'
WPUB:
  status: Publication withdrawn
  stage: '95.99'
preCD:
  status: Preparation of CD document
  stage: preCD
preCDPAS:
  status: Preparation of DPAS
  stage: preCDPAS
preDISH:
  status: Preparation of DISH
  stage: preDISH
preDTR:
  status: Preparation of DTR document
  stage: preDTR
prePNW:
  status: Preparation of NP document
  stage: prePNW
ronaldtse commented 2 years ago

@andrew2net the documentation for this API is here: https://bitbucket.org/sdo-hapi/data-models/src/master/

REVISED is this line: https://bitbucket.org/sdo-hapi/data-models/src/25109b2477eccc318794642b58f56785c6e626ef/schema/xsd/publications/harmonized-publications.xsd#lines-66

This means that IN_DEVELOPMENT, DRAFT, PUBLISHED, REPLACED, REVISED, WITHDRAWN are "project statuses", not stage codes.

The stage codes are here: https://bitbucket.org/sdo-hapi/data-models/src/25109b2477eccc318794642b58f56785c6e626ef/schema/xsd/publications/harmonized-publications.xsd#lines-123

andrew2net commented 2 years ago

The stage codes are here: https://bitbucket.org/sdo-hapi/data-models/src/25109b2477eccc318794642b58f56785c6e626ef/schema/xsd/publications/harmonized-publications.xsd#lines-123

@ronaldtse I can't find any stage codes in the documentation.

ronaldtse commented 2 years ago

The documentation at that line says:

@ISO: uses exclusively "CD", "DIS", "FDIS" and "IS"; they matches the rows 30, 40, 50, and 60 of the harmonized stage codes matrix. In case the project's stages are re-iterated, only the last contents is exposed as a publication. Notice that this value also appears in the publication urn (see samples).</xs:documentation>

I guess they don't use stage codes.

You can see a sample here, although it is from ISO: https://bitbucket.org/sdo-hapi/data-models/src/master/samples/publications/iso_pub_PUB100367.xml

They also support JSON (which is better to use): https://bitbucket.org/sdo-hapi/data-models/src/master/samples/publications/iso_pub_std_FDIS_62085.json

IEC sample: https://bitbucket.org/sdo-hapi/data-models/src/master/samples/publications/publications_iec.xml

andrew2net commented 2 years ago

@ronaldtse there are date types that I don't know how to map:

In the relaton model, we have the following date types:

"published" | "accessed" | "created" | "implemented" | "obsoleted" | "confirmed" | "updated" | "issued" | "transmitted" | "copied" | "unchanged" | "circulated" | "adapted" | "vote-started" | "vote-ended" | "announced"

https://github.com/relaton/relaton-models/blob/ee413e3fb6e8f098972a0a0f528141e1db730633/grammars/biblio.rnc#L307

ronaldtse commented 2 years ago
opoudjis commented 2 years ago

We have three more flavour-specific date types:

IEEE: feedback-ended NIST: superseded abandoned

But stability isn't on there (though I was convinced this has come up before.)

This does seem legitimate as a global addition. I want to preserve the pattern of past participles, but I don't see how.

ronaldtse commented 2 years ago

So:

To be added to the model?

andrew2net commented 2 years ago

But stability isn't on there (though I was convinced this has come up before.)

@opoudjis maybe it has never come up. It's just in the xsd schema

UPD the are stabilityDates in the data provided by the API

opoudjis commented 2 years ago

Dates are all point events in the Relaton grammar currently, and your request to randomly add intervals to that listing is denied. You are starting to think that the Metanorma grammars can randomly be changed to satisfy your aesthetics, @ronaldtse, and that is not going to happen on my watch. The stability of the grammars remains paramount, and the inclination to change them when it is not absolutely necessary is one I shall combat without mercy.

Flavour-specific modifications of dates are going to be kept specific to those flavours (and ideally stripped out when they are consumed outside of those flavours).

Therefore:

BibiographicDate type remains as is for NIST and IEEE, to which those local type extensions have already been added.

vote_started and vote_ended remain as is. feedback_ended remains as is.

stable-until does not need to be a time interval, since the start of the time interval is always going to be the most recent of published, updated, or unchanged. The start of the interval is always going to be specified in a pre-existing date.

andrew2net commented 2 years ago

@ronaldtse the guideline recommends to query last changes. I suggest that daly running script updates last changes. For updating all the documents we can use GHA workflow dispatch. What do you think?

ronaldtse commented 2 years ago

I agree!

ronaldtse commented 1 year ago

This has been implemented. Thanks @andrew2net !