proycon / codemetapy

A Python package for generating and working with codemeta
https://codemeta.github.io/
GNU General Public License v3.0
24 stars 5 forks source link

codemetapy output is not static across identical runs #26

Closed matthewfeickert closed 1 year ago

matthewfeickert commented 1 year ago

It seems that in codemetapy v2.0+ the output of subsequent runs is not static. This makes it impossible to be able to diff the output between runs (without doing manual sorting by yourself).

Example:

> docker run --rm -ti python:3.10 /bin/bash
root@68d255b7e087:/# python -m venv venv && . venv/bin/activate
(venv) root@68d255b7e087:/# python -m pip --quiet install --upgrade pip setuptools wheel
(venv) root@68d255b7e087:/# python -m pip --quiet install --pre 'pyhf==0.7.0rc4'
(venv) root@68d255b7e087:/# python -m pip --quiet install 'codemetapy==2.2.1'
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run1.json
...
(venv) root@68d255b7e087:/# codemetapy --inputtype python --no-extras pyhf > codemeta_run2.json
...
(venv) root@68d255b7e087:/# apt update && apt install -y jq
(venv) root@68d255b7e087:/# diff <(jq -S .softwareRequirements codemeta_run1.json) <(jq -S .softwareRequirements codemeta_run2.json)
11,18d10
<     "@id": "/dependency/pyyaml-ge-5.1",
<     "@type": "SoftwareApplication",
<     "identifier": "pyyaml",
<     "name": "pyyaml",
<     "runtimePlatform": "Python 3",
<     "version": ">=5.1"
<   },
<   {
26a19,26
>     "@id": "/dependency/importlib-resources-ge-1.4.0",
>     "@type": "SoftwareApplication",
>     "identifier": "importlib-resources",
>     "name": "importlib-resources",
>     "runtimePlatform": "Python 3",
>     "version": ">=1.4.0"
>   },
>   {
35c35
<     "@id": "/dependency/scipy-ge-1.1.0",
---
>     "@id": "/dependency/tqdm-ge-4.56.0",
37,38c37,38
<     "identifier": "scipy",
<     "name": "scipy",
---
>     "identifier": "tqdm",
>     "name": "tqdm",
40c40
<     "version": ">=1.1.0"
---
>     "version": ">=4.56.0"
43c43
<     "@id": "/dependency/jsonschema-ge-4.15.0",
---
>     "@id": "/dependency/pyyaml-ge-5.1",
45,46c45,46
<     "identifier": "jsonschema",
<     "name": "jsonschema",
---
>     "identifier": "pyyaml",
>     "name": "pyyaml",
48c48
<     "version": ">=4.15.0"
---
>     "version": ">=5.1"
51c51
<     "@id": "/dependency/importlib-resources-ge-1.4.0",
---
>     "@id": "/dependency/scipy-ge-1.1.0",
53,54c53,54
<     "identifier": "importlib-resources",
<     "name": "importlib-resources",
---
>     "identifier": "scipy",
>     "name": "scipy",
56c56
<     "version": ">=1.4.0"
---
>     "version": ">=1.1.0"
59c59
<     "@id": "/dependency/tqdm-ge-4.56.0",
---
>     "@id": "/dependency/jsonschema-ge-4.15.0",
61,62c61,62
<     "identifier": "tqdm",
<     "name": "tqdm",
---
>     "identifier": "jsonschema",
>     "name": "jsonschema",
64c64
<     "version": ">=4.56.0"
---
>     "version": ">=4.15.0"
(venv) root@68d255b7e087:/#

In codemetapy v0.3.5 the output was statically reproducible across runs. Is this something that could be supported again? Or should users sort the JSON manually if they want it?

For an example of how this affects workflows c.f. https://github.com/scikit-hep/pyhf/pull/2002/

proycon commented 1 year ago

Good point, I agree that we want the output to be deterministic whereever possible, even though formally it makes no difference of course (it describes the same RDF graph). I'll look into this and probably implement alphabetical sorting for the dependencies to solve this, though it might pop up in other places still too.

proycon commented 1 year ago

I just fixed this and released v2.2.2 (with some other fixes too). I can't guarantee yet that it's always deterministic with all input, but at least lists should end up sorted now, which fixes the problem you mentioned.

PS: One thing I notice with pyhf is that the version number doesn't land correctly in the codemeta.json anymore (I see only 0.0.0). But this bug seems to be outside codemetapy because this 0.0.0 already literally appears in the generated pyhf.egg-info/PKG-INFO which codemetapy uses as source.

proycon commented 1 year ago

Another unrelated PS: by default some extensions on top of codemeta are enabled, if you don't want those you can specify --strict. You may also want to specify --released to get an accurate codemeta:developmentStatus.