Currently, the metadata.yaml file (properties defined in koza/model/config/source_config.py) is not being used in any way other than documentation, and has fields with misleading names relative to how they're being used:
id simply isn't clear
name appears to be getting used as the name of the data source, which is redundant with ingest_title
ingest_title implies the title of the ingest itself, not the name of the data source as expected?
ingest_url same thing, would expect this is the link to the ingest repo, not the data source.
We should consider renaming these to reflect what we expect the user to use here
source / provided_by: are these not the same thing? we should add a docstring making the distinction clear
It seems like source has been deprecated in favor of provided_by, and is no longer being used. Can we remove from the model?
rights / license: duplicates? do they refer to different things? again, docstring clarifying this would be good.
license does not appear to be used in any of our ingests. Can we remove from model?
One possible option is to simply remove the definition and usage of metadata within Koza, and simply allow it to exist for documentation purposes alongside ingest files.
Another would be to allow Koza to read the metadata file, and use the data contained within as default values for various fields during the transform process, possibly writing to an output metadata file, or as columns in the transform output.
I do think our move towards modularized ingests adds some importance to sorting this out.
Maybe we can add this as an agenda item for one of a data call?
Currently, the
metadata.yaml
file (properties defined inkoza/model/config/source_config.py
) is not being used in any way other than documentation, and has fields with misleading names relative to how they're being used:id
simply isn't clearname
appears to be getting used as the name of the data source, which is redundant withingest_title
ingest_title
implies the title of the ingest itself, not the name of the data source as expected?ingest_url
same thing, would expect this is the link to the ingest repo, not the data source. We should consider renaming these to reflect what we expect the user to use heresource
/provided_by
: are these not the same thing? we should add a docstring making the distinction clearsource
has been deprecated in favor ofprovided_by
, and is no longer being used. Can we remove from the model?rights
/license
: duplicates? do they refer to different things? again, docstring clarifying this would be good.license
does not appear to be used in any of our ingests. Can we remove from model?One possible option is to simply remove the definition and usage of
metadata
within Koza, and simply allow it to exist for documentation purposes alongside ingest files.Another would be to allow Koza to read the metadata file, and use the data contained within as default values for various fields during the transform process, possibly writing to an output metadata file, or as columns in the transform output.
I do think our move towards modularized ingests adds some importance to sorting this out.
Maybe we can add this as an agenda item for one of a data call?