softwarepub / hermes

Implementation of the HERMES workflow
https://docs.software-metadata.pub
Other
21 stars 6 forks source link

Provide processed version number #67

Open sdruskat opened 2 years ago

sdruskat commented 2 years ago

There are potentially many "version identifiers" for a single version (timestamp, version string (v1.0.3), etc.). For publication, we need one version identifier.

Includes deduplication, e.g., cff-provided version vs. tag vs. ...

In this processing step, this version identifier will be defined. This processing step should also allow a configuration, where a reserved tag or similar is used to trigger the workflow, which will then, however, extract the real version from the tag. This real version will be used for publication, and could also be used in post-processing to tag the published commit.

led02 commented 1 year ago

With #154 almost merged, we can continue here...

thoght: This could also implemented in an abstract way as a processor (e.g., not only making the precedence configurable but also the key for which this rule should be applied). Also, this could be already decided during the merging of the models.

[process]
use = [ "select_alternative" ]

[[process.select_alternative]]
path = "version"
precedence = [ "cff", "codemeta" ]
jkelling commented 1 year ago

I think the selection of the prioritized version number should be considered part of the curation step, not the processing.

@led02 's idea of allowing users to select harverster precedence on a per-property basis sounds good. This would kind-of an automated pre-curation, or choices made during curation, which can be pushed back into the processing step to provide custom pre-selection on subsequent processing and curation runs. I think there should not be specific code for the version number.

poikilotherm commented 1 year ago

Reopening because we still need to discuss how to do this. (Closed by accident)

Also, we need to take into account version numbers from the context/environment (CI provides some of this data when using tags), git harvester and (potentially) project files (Maven POM, pyproject.toml, etc)

jkelling commented 1 year ago

Reopening because we still need to discuss how to do this. (Closed by accident)

I think we should move the this issue to the alpha milestone then, because the basics are already there. The discussion will be about more advanced configuration options to automate part of the curation step.

Also, we need to take into account version numbers from the context/environment (CI provides some of this data when using tags), git harvester and (potentially) project files (Maven POM, pyproject.toml, etc)

Taking into account other sources does not look like an open question to me: the way to add support for each thing you listed is by adding a corresponding harvester. Automatic selection from among conflicting data sources is part of curation-automation question.

sdruskat commented 1 year ago

Moved to Alpha milestone.