Mapping F14: reconciling TED and OCDS data update models

ColinMaudry commented 5 years ago

TED update mechanism: F14, the "Corrigendum" notice

TED form F14 enables European buyers to amend a previously published notice.

It works this way:

The amended form is referred via its OJ number
The amended data fields and the corresponding data are listed in <CHANGE> elements
Each <CHANGE> element contains
- a <WHERE> field that identifies the amended field. It identifies the field via its section number (e.g IV.2.2)) and its localized label (e.g Plazo para la recepción de ofertas o solicitudes de participación, for a notice in Spanish).
- a <OLD_VALUE> element with the value to correct for the field. The content of the element is typed (it can be <TEXT>, <DATE>, <TIME>, or various forms of CPV codes)
- a <NEW_VALUE> element with the new correct value. The content is also typed.
The buyer can describe the change in plain text in the <INFO_ADD> element

Here is a full example taken from 216944_2019.xml:

<F14_2014 CATEGORY="ORIGINAL" FORM="F14" LG="ES">
      <LEGAL_BASIS VALUE="32014L0024"/>
      <CONTRACTING_BODY>
        [...]
      </CONTRACTING_BODY>
      <OBJECT_CONTRACT>
        [...]
      </OBJECT_CONTRACT>
      <COMPLEMENTARY_INFO>
        <DATE_DISPATCH_NOTICE>2019-05-07</DATE_DISPATCH_NOTICE>
        <NOTICE_NUMBER_OJ>2019/S 068-159723</NOTICE_NUMBER_OJ>
      </COMPLEMENTARY_INFO>
      <CHANGES>
        <CHANGE PUBLICATION="YES">
          <WHERE>
            <SECTION>IV.2.2)</SECTION>
            <LABEL>Plazo para la recepción de ofertas o solicitudes de participación</LABEL>
          </WHERE>
          <OLD_VALUE>
            <DATE>2019-05-07</DATE>
            <TIME>23:59</TIME>
          </OLD_VALUE>
          <NEW_VALUE>
            <DATE>2019-05-22</DATE>
            <TIME>19:00</TIME>
          </NEW_VALUE>
        </CHANGE>
        <INFO_ADD>
          <P>Se amplía el plazo de licitación dado que, habiéndose efectuado preguntas con 12 días de antelación a la finalización del plazo de licitación, no han sido respondidas. En virtud del artículo 136.2 LCSP, por el que se indica que se debe ampliar el plazo inicial de presentación de ofertas.</P>
        </INFO_ADD>
      </CHANGES>
    </F14_2014>

OCDS update mechanism, the release system

The OCDS update mechanism is well documented in the standard documentation.

In short, all the releases that bear the same ocid describe the same contracting procedure. Each release adds new data (e.g. when the tender is awarded) or amends previously published data (e.g. when the duration of a contract is modified).

In order to compute the current state of a contracting procedure, one would add up all the releases in chronological order and retain the latest value for each OCDS field.

Reconciling the two mechanisms

How to identify the field amended by the TED Corrigendum notice?

TED data fields are almost entirely mapped to OCDS fields. This mapping is done from TED XML element to OCDS JSON field. That means that for a given TED XML field, we can tell what the equivalent value can be found in the OCDS data structure.

However, as detailed above, a Corrigendum notice identifies the amended field with

the form section where the field is located
the localized label of the amended field

If one would like to transform Corrigendum data into an OCDS release to amend TED previously transformed to OCDS, they would lack an unambiguous way to identify the OCDS field to amend, since the amended TED XML element is not indicated. There would be no common key

ColinMaudry commented 5 years ago

Since the section usually designates a group of form fields, we must use the localized field label as mapping key to determine the corresponding TED XML field, that would lead to mapping instructions to an OCDS field.

We consequently must build a reference table with all TED XML elements and, for each, all the localized labels.

jpmckinney commented 5 years ago

This sounds good. I can generate that table using our CSVs and add it to the website as a reference page. Can you propose a mapping where one step is "lookup XYZ in the reference table "?

ColinMaudry commented 5 years ago

I'm on Section III, as it seems more urgent. I'll do it next.

jpmckinney commented 5 years ago

I am generating the table, and it is raising questions for me.

https://ted.europa.eu/udl?uri=TED:NOTICE:357151-2019:TEXT:EN:HTML has "Place of text to be modified: Date limite de réception des offres ou des demandes de participation". It is easy to generate a table to determine that the French text corresponds to "Time limit for receipt of tenders or requests to participate" (the EC provides a Form labels Excel file, which is probably better than a table on the web).

However, on F02, "Time limit for receipt of tenders or requests to participate" corresponds to two XML elements: /PROCEDURE/DATE_RECEIPT_TENDERS and /PROCEDURE/TIME_RECEIPT_TENDERS. Furthermore, the mapping of many fields is not simple and might involve creating objects, setting id's, etc. So, we can't easily make a table that allows a lookup of a French label to find the mapping guidance.

I think, instead, the guidance for the form might be to find the English label using the Excel file, then apply the guidance related to that English label on the appropriate form.

In terms of implementation, an implementer would check the possible values for "Place of text to be modified", and then implement rules for each based on the guidance on other forms.

I think it'll be a lot of work to implement, but I don't think there is another way.

ColinMaudry commented 5 years ago

To summarize @jpmckinney 's plan: instead of looking the element name based on the EU "English label-element" mapping, we use OCDS "English label-element" mapping.

But how does that prevent the issue of having several XML fields (and several guidance paragraphs) matching a single English label?

jpmckinney commented 5 years ago

I think the process is:

Get the language in /@LG.
Get the English label for the value in Place of text to be modified (/CHANGES/CHANGE/WHERE/LABEL).
Find the OCDS guidance for the English label.
- Use the Section number (/CHANGES/CHANGE/WHERE/SECTION) to disambiguate if there are multiple matches for the English label.
- If the English label has no guidance, take the guidance for the labels that follow. For example, for Estimated total value, take the guidance for Value excluding VAT and Currency.
If the OCDS guidance pertains to a lot, use the Lot No (/CHANGES/CHANGE/WHERE/LOT_NO) to get the Lot object in tender.objects with a matching .id.
Apply the OCDS guidance.

ColinMaudry commented 5 years ago

OK, that's the "take the guidance for the labels that follow" bit that I was missing. Thanks!

ColinMaudry commented 5 years ago

I guess that by "English label" you meant "label-key" (e.g. address_phone) to determine the mapping guidance, since the English labels are not present in OCDS mapping CSVs.

jpmckinney commented 5 years ago

I was assuming the user would be looking things up in HTML tables, but they can indeed do the same using the CSVs, in which case they need to use the label key.

open-contracting-extensions / european-union