proycon / codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
GNU General Public License v3.0
8 stars 4 forks source link

Use License URI rather than literal #11

Open ddeboer opened 1 year ago

ddeboer commented 1 year ago

The harvester produces:

"license": "EUPL-1.2",

while perhaps

"license": "https://spdx.org/licenses/EUPL-1.2",

should be preferred. Of course, some extra work may need to be done to determine this from the LICENSE file.

proycon commented 1 year ago

Hmm, I'm surprised it didn't do that already. Things should indeed be set up to prefer SPDX URIs whereever possible. Do you have the relevant harvester log for this project?

ddeboer commented 1 year ago
[harvester info] codemeta-harvester 0.3.1 (outputdir=/data, cachedir=/tmp/codemeta-harvester.cache/, opts=)
[harvester info] No configuration provided, harvesting current project
[harvester info] Attempting to guess source repo
[harvester info] Source repo is https://github.com/netwerk-digitaal-erfgoed/network-of-terms.git
[harvester info] Git reference:
[harvester info] Scanning directory /data for harvestable resources...
[harvester info] found package.json (NodeJS) for network-of-terms, converting to codemeta
[harvester info] Looking for license....
[harvester info] No license file found
[harvester info] Getting contributors from git...
[harvester info] Extracting last and first commit date from git log....
[harvester info] Date created: 2020-04-17T00:22:24Z+0200, date modified: 2022-12-19T11:11:55Z+0000
[harvester info] Querying Github/GitLab API (https://github.com/netwerk-digitaal-erfgoed/network-of-terms)
[harvester info] Found README.md
[harvester info] Looking for TRL information in README.md...
[harvester info] Looking for repostatus information in README.md...
[harvester info] Looking for continuous integration information in README.md...
[harvester info] Looking for documentation links in README.md...
[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...
[harvester info] Inferred repostatus https://www.repostatus.org/#active
[harvester info] Reconciliating: codemetapy  --identifier "network-of-terms" --codeRepository "https://github.com/netwerk-digitaal-erfgoed/network-of-terms.git" --released --enrich --textv "" -O /data/network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.network-of-terms.codemeta.json /tmp/codemeta-harvester.cache//tmp/22-npm.network-of-terms.codemeta.json

-- begin log --
Passed 6 files/sources but specified 0 input types! Automatically guessing types...
Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-repostatus.network-of-terms.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.network-of-terms.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.network-of-terms.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.network-of-terms.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.network-of-terms.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/22-npm.network-of-terms.codemeta.json', 'json')]
Note: You did not specify a --baseuri so we will not provide identifiers (IRIs) for your SoftwareSourceCode resources (and others)
Initial URI automatically generated, may be overriden later: file:///network-of-terms
Processing source #1 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.network-of-terms.codemeta.json
    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 1 new triples, total is now 2
Processing source #2 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.network-of-terms.codemeta.json
    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 1 new triples, total is now 3
Processing source #3 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.network-of-terms.codemeta.json
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 17 new triples, total is now 19
Processing source #4 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.network-of-terms.codemeta.json
    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] overriding old http://schema.org/dateCreated (2020-04-16T19:29:33Z -> 2020-04-17T00:22:24Z+0200)
[CODEMETA COMPOSITION (file:///network-of-terms)] overriding old http://schema.org/dateModified (2022-12-19T11:18:32Z -> 2022-12-19T11:11:55Z+0000)
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 2 new triples, total is now 19
Processing source #5 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/32-contributors.network-of-terms.codemeta.json
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 44 new triples, total is now 62
Processing source #6 of 6
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/22-npm.network-of-terms.codemeta.json
    Injected (possibly temporary) URI file:///network-of-terms
[CODEMETA COMPOSITION (file:///network-of-terms)] processed 49 new triples, total is now 109
[CODEMETA VALIDATION (network-of-terms)] author not set
-- end log --
[harvester info] Output written to /data/network-of-terms.codemeta.json
[harvester info] Output renamed to /data/codemeta.json
[harvester info] No license file found

Not sure why? We have LICENSE.md rather than LICENSE.