proycon / codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
GNU General Public License v3.0
8 stars 4 forks source link

Aggregation of software in groups #10

Closed proycon closed 2 years ago

proycon commented 2 years ago

Codemeta describes software at the schema:SoftwareSourceCode level. These constitute the units in the index of our html visualisation (via codemetapy & codemeta-server, c.f. https://tools.dev.clariah.nl). Additionally, we already offer a services view on the data that puts the schema:targetProduct central when it is a schema:WebApplication.

The need for more aggregation was expressed. There may be a software project that consists of various schema:SoftwareSourceCode entries, for instance one being a CLI tool, one a web application layer around the tool, and one a Python binding. It can be said that these are all components of some higher-order software project. Ideally, the dependency relations (schema:softwareRequirements) make it explicit that the components are related. But extracting this automatically and automatically determining the clusters of such software projects is not always feasible.

To keep the aggregation as simple as possible, we can reuse the existing schema:applicationSuite property, which takes a simple textual value of the name of the software suite that unites the components. In the html index, tools in the same suite will then be grouped together under the appropriate header. (If a tool belongs to multiple application suites, it will be rendered multiple times in the index).

Given the issues in automating this. This clustering is a manual effort and the tool source registry plays a role here, a simple group key can be added to the YAML entries in the tool source registry. The harvester will translate it to schema:applicationSuite and propagate it to the actual metadata. So whereas software metadata is by definition under the authorship of the developers themselves, this suite property may be expressed at the level of the source registry (though the authors may also express it themselves), giving the tool registry providers have a (final) say in how participating software is grouped. I think that is a fair solution since it's also up to the tool registry provider to decide what tools participate in his/her registry in the first place.

proycon commented 2 years ago

Implemented and released