open-contracting / ocdskit

A suite of command-line tools for working with OCDS data
https://ocdskit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

mapping-sheet: add functionality related to extensions and tracking #55

Closed odscjames closed 5 years ago

odscjames commented 6 years ago

We need a new library function.

Currently:

mapping-sheet takes a schema version and produces a list of fields as a CSV

We need the same, but with:

input:

(We are passing extension JSON object rather than URL so that it can be cached by calling code, so this function is simpler and does less and so it can be used on extensions that are not online)

output: Python data structure with a list of field paths, and for each field path:

Also add a CLI command to call this to OCDSKit. But make sure it's a Python function so it can be called from external stuff like ocdsdata. (see https://github.com/open-contracting/ocdsdata/pull/68 )

This could be a new command, or it could update the "mapping-sheet" command. That is left for discussion.

This is needed in OCDSData. Also it turns out other people have written bash versions of this, so it come up before.

(This discussed at Aug Montreal retreat)

jpmckinney commented 6 years ago

Sounds good in terms of business requirements. I leave specific implementation up to the assignee, though my preference is to keep a single mapping-sheet command.

yolile commented 5 years ago

For:

Also add a CLI command to call this to OCDSKit. But make sure it's a Python function so it can be called from external stuff like ocdsdata. (see open-contracting/kingfisher#68 )

The cli command should generete a csv output with the same columns that the actual command has? Because the requested method only returns the path and the scope of the field

jpmckinney commented 5 years ago

I haven't read the code thoroughly, but if we want to make a Python function available, I figure we can create an ocdskit/mapping_sheet.py file, and move the functions that are currently nested under handle into it. The display_properties function would be the one that other packages call (though we will probably want to rename the functions to be more self-explanatory).

Update: While moving the code, you can merge the find_md_links, remove_links, and display_links functions into where they are called from, as they are only called from one place – and they really aren't reusable functions on their own. We should also move the JsonRef.replace_refs call into display_properties.

robredpath commented 5 years ago

@pindec @mrshll1001 @duncandewhurst is this kind of coverage reporting still useful? If so, we can see if @yolile can get this moved forward and then look at integrating it with the work we're doing on Kingfisher at the moment

jpmckinney commented 5 years ago

I think a minimal version is needed, in particular for OCDS 1.2 research, to measure the level of usage of each extension's fields. We don't need to satisfy the following though, as it should never occur (or only occur in cases where we don't care to disambiguate):

if a field is defined in 2 places (eg a field is deprcated in core and made active again in extension, or not required in core but required in an extension) then we need to track that the field is defined in both places

robredpath commented 5 years ago

@yolile have you had a chance to look at this?

yolile commented 5 years ago

@robredpath my progress on this are in the mapping-sheet-with-extensions branch

jpmckinney commented 5 years ago

Here's @yolile's branch: https://github.com/open-contracting/ocdskit/compare/mapping-sheet-with-extensions?w=1

I took a different approach, building on code in https://github.com/open-contracting/extension_registry.py. The idea is for mapping-sheet to remain a simple command that can accept any JSON Schema, without having to understand how to extend a base schema with extensions, determine dependencies, etc. That's what ocdsextensionregistry is for.

With the GitHub versions of ocdskit and ocdsextensionregistry, you can now run a short script like:

import sys

import jsonref

from ocdskit.mapping_sheet import mapping_sheet
from ocdsextensionregistry import ProfileBuilder

builder = ProfileBuilder('1__1__4', [
    'https://github.com/open-contracting-extensions/ocds_coveredBy_extension/archive/master.zip',
    # Or, a metadata URL:
    # 'https://raw.githubusercontent.com/open-contracting-extensions/ocds_coveredBy_extension/master/extension.json',
    # Or, a base URL:
    # 'https://raw.githubusercontent.com/open-contracting-extensions/ocds_coveredBy_extension/master/',
])

# Patch OCDS 1.1.4 with the above extensions, and add the name of the extension
# in an "extension" field in each definition and field.
schema = builder.patched_release_schema(extension_field='extension')

dereferenced_schema = JsonRef.replace_refs(schema)

# Output a mapping sheet with an "extension" column, with the values from the
# "extension" field added above.
mapping_sheet(dereferenced_schema, sys.stdout, infer_required=True, extension_field='extension')
jpmckinney commented 5 years ago

@yolile @duncandewhurst Do we need a repeatable --extension URL option for the mapping-sheet command, or is it enough for our purposes to use short scripts like the above? If there's demand for this feature, we can add it – I'm just not sure what the demand is. I just wrote this because I needed a mapping sheet with all extensions for Colin Maudry's work.

jpmckinney commented 5 years ago

I realized it'd only be a few more lines to add this feature. It's now in the unreleased version.