open-contracting / ocdskit

A suite of command-line tools for working with OCDS data
https://ocdskit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

mapping-sheet: Confirm use cases #102

Closed jpmckinney closed 5 years ago

jpmckinney commented 5 years ago

In #42, there was a discussion about the two rows that mapping-sheet produces for the same schema path: one row for the array and one row for the object in the array.

Why do we need two rows? I'm centralizing the code that analyzes schemas into ocdskit (so that we have the same underlying methods for generating the schema tables in Extension Explorer, the schema tables in OCDS docs, and for the mapping-sheet command).

Right now, I don't see why we should have two rows instead of one. Can you explain?

duncandewhurst commented 5 years ago

FYI the discussion in #42 didn't lead to the two rows approach for arrays and objects - this was an existing behaviour, the issue was about adding a similar behaviour for references in the schema.

However, I think the use cases given in #42 apply in either case - it is so that the description of the array (or reference) is not overwritten by the description of the objects in the array (or the object being referenced).

For example, the description of tender.tenderers is:

All parties who submit a bid on a tender. More detailed information on bids and the bidding organization can be provided using the bid extension.

But the description of the objects in the array is:

The id and name of the party being referenced. Used to cross-reference to the parties section

So, if the former was overwritten with the latter in the output of mapping-sheet, then the definition of what a tenderer is would be lost.

jpmckinney commented 5 years ago

Yes, I know all that :) But what is the use case for two rows in terms of using the sheet to perform a mapping (or whatever other uses this CSV is put towards)?

There is more than one way to preserve the two titles/definitions: additional columns, concatenation, etc.

By describing the ultimate use cases, we'll know which options can be pursued.

duncandewhurst commented 5 years ago

As far as I'm aware the ultimate use case is "making sure that users of the output of the mapping-sheet command have access to all the information in the schema" - this is because we know that this is often used by publishers as the basis for their own mappings (rather than using the full mapping template).

From a helpdesk perspective, we've also used the output of the mapping-sheet command to partially automate generation of the field level mapping template, where the array-level rows can be used as a col-spanned header with the subsequent rows for the properties of the object in the array used for the actual mapping, but I'm sure we could work with alternative flattened representations of the schema.

jpmckinney commented 5 years ago

I found a way to preserve the functionality while still having common underlying code.