psychoinformatics-de / datalad-tabby

DataLad extension package for the "tabby" dataset metadata specification
Other
1 stars 5 forks source link

Metadata override feature #47

Closed mih closed 1 year ago

mih commented 1 year ago

The key purpose of this feature is to enrich metadata with additional properties (keys), and to replace values of existing properties (keys) with other values. The "amend" use case (add values to a particular key) is only supported vai the proxy of replacing all values with a copy of the original values that has new items appended (because @mih thinks that this case is not common in typical metadata enrichment scenarios).

The current implementation allows for a unchanged TSV (author-provided) to be combined with an evolving override (curator-provided) to inject additional properties (e.g., @type, or @id) in the metadata record without altering the structure/content of the author-provided dcouments, thereby avoiding any friction or incompatibilities with the workflows or processes that yielded them in the first place.

This makes it easier to feed back corrections to the original authors (or ask for them), and does not require any party to adjust to a different workflow.

Also:

codecov-commenter commented 1 year ago

Codecov Report

Merging #47 (56d400f) into main (314b5a8) will increase coverage by 0.16%. The diff coverage is 100.00%.

:exclamation: Current head 56d400f differs from pull request most recent head 2c718b9. Consider uploading reports for the commit 2c718b9 to get more accurate results

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   99.08%   99.24%   +0.16%     
==========================================
  Files          11       12       +1     
  Lines         218      265      +47     
==========================================
+ Hits          216      263      +47     
  Misses          2        2              
Impacted Files Coverage Δ
datalad_tabby/io/__init__.py 99.15% <100.00%> (+0.18%) :arrow_up:
datalad_tabby/io/tests/test_overrides.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

christian-monch commented 1 year ago

Nice. Tested it locally, works as expected

mih commented 1 year ago

Thanks!

mslw commented 1 year ago

I had trouble understanding how the overrides work from the docs description alone (and even tests), but with a few examples it became clear.

Assuming a doi field (single value):

doi    https://doi.org/10.nnnnnn/example

the override can be a literal

{
  "doi": true
}
# produces
{'doi': True}

a string:

{
  "doi": "I am become {doi[0]}"
}
# produces
{'doi': 'I am become https://doi.org/10.nnnnnn/example'}

or a list:

{
    "doi": ["Identifier", "doi", "{doi[0]}"]
}
# produces
{'doi': ['Identifier', 'doi', 'https://doi.org/10.nnnnnn/example`']}

Note: but an object would fall under JSON literal, hence no string substitution:

{
  "doi": {
    "type": "identifier",
    "name": "doi",
    "value": "{doi[0]}"
  }
}
# produces
{'doi': {'name': 'doi', 'type': 'identifier', 'value': '{doi[0]}'},}

In the example above, we used the same key (doi), replacing its value. But we could add a new key (e.g. doi-modified) in the same way.


Thinking of vertical tables, I can imagine e.g a file listing mice in an experiment, where strain is entered (for brevity) as a numerical JAX code:

id    strain_jax    ...
01    018280
...

This could be overridden with the following (three new fields, two derived and one fixed):

{
    "RRID": "RRID:IMSR_JAX:{strain_jax[0]}",
    "url": "https://www.jax.org/strain/{strain_jax[0]}",
    "schema": "https://custom_schema.org/mouseExperiment"
}
# produces
[
  {"id": 01, "strain_jax": "018280", "RRID": "RRID:IMSR_JAX:018280", "url": "https://www.jax.org/strain/018280", "schema": "https://custom_schema.org/mouseExperiment"},
# ...
]