proycon / codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
GNU General Public License v3.0
8 stars 4 forks source link

Test case, Support for C+ projects. #5

Open broeder-j opened 2 years ago

broeder-j commented 2 years ago

Hier is an example project which has quite some metadata already: https://jugit.fz-juelich.de/ped-dyn-emp/petrack

DOIs in badges, cite description and dependency description in README and an .zendo.json file. I put this here as a test case.

codemeta-harvester just gets the basics, if it does not get the 'ReadMe.md'. With the patch in the merge request it does, but then it fails, because codemetapy fails in parsing the giturl (it is not a python repo), it ties to parse manifest.json

Output

$ codemeta-harvester
[harvester info] codemeta-harvester 0.2.3 (outputdir=/home/j.broeder/work/git/petrack, cachedir=/tmp/codemeta-harvester.cache/, opts=)
[harvester info] No configuration provided, harvesting current project
[harvester info] Attempting to guess source repo
[harvester info] Source repo is https://jugit.fz-juelich.de/ped-dyn-emp/petrack.git
[harvester info] Scanning directory /home/j.broeder/work/git/petrack for harvestable resources...
[harvester info] Looking for license....
[harvester info] Found license GPL-3.0-only
[harvester info] Getting contributors from git...
[harvester info] Extracting last and first commit date from git log....
[harvester info] Date created: 2022-09-22T11:43:53Z+0200, date modified: 2022-09-22T11:43:53Z+0200
[harvester info] Querying Github/GitLab API (https://jugit.fz-juelich.de:ped-dyn-emp/petrack)
-- begin log --
URI automatically generated, may be overriden later: /petrack
Processing source #1 of 1
Traceback (most recent call last):
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/models.py", line 434, in prepare_url
    scheme, auth, host, port, path, query, fragment = parse_url(url)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/urllib3/util/url.py", line 397, in parse_url
    return six.raise_from(LocationParseError(source_url), None)
  File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: https://jugit.fz-juelich.de:ped-dyn-emp/-/manifest.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/j.broeder/work/git/codemeta-harvester/env/bin/codemetapy", line 11, in <module>
    load_entry_point('CodeMetaPy==2.2.3', 'console_scripts', 'codemetapy')()
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/codemeta.py", line 139, in main
    g, res, args, contextgraph = build(**args.__dict__)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/codemeta.py", line 364, in build
    inputtype = codemeta.parsers.gitapi.get_repo_kind(source)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/parsers/gitapi.py", line 44, in get_repo_kind
    response = requests.get(test_url)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/requests/models.py", line 436, in prepare_url
    raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: https://jugit.fz-juelich.de:ped-dyn-emp/-/manifest.json
-- end log --
[harvester error] conversion from Github/GitLab API query failed for petrack (https://jugit.fz-juelich.de:ped-dyn-emp/petrack) (codemetapy failed)
[harvester info] Found README.md
[harvester info] Looking for repostatus information in README...
[harvester info] Looking for continuous integration information in README...
[harvester info] Looking for documentation links in README...
[harvester info] Reconciliating: codemetapy  --identifier "petrack" --codeRepository "https://jugit.fz-juelich.de/ped-dyn-emp/petrack.git" --released --enrich --textv "" -O /home/j.broeder/work/git/petrack/petrack.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.petrack.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.petrack.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.petrack.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.petrack.codemeta.json 
-- begin log --
Passed 4 files/sources but specified 0 input types! Automatically guessing types...
Detected input types: [('/tmp/codemeta-harvester.cache//tmp/41-readme.petrack.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.petrack.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.petrack.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.petrack.codemeta.json', 'json')]
URI derived from explicitly passed codeRepository: https://jugit.fz-juelich.de/ped-dyn-emp/petrack.git
Processing source #1 of 4
Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.petrack.codemeta.json
    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...
Traceback (most recent call last):
  File "/home/j.broeder/work/git/codemeta-harvester/env/bin/codemetapy", line 11, in <module>
    load_entry_point('CodeMetaPy==2.2.3', 'console_scripts', 'codemetapy')()
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/codemeta.py", line 139, in main
    g, res, args, contextgraph = build(**args.__dict__)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/codemeta.py", line 348, in build
    prefuri = codemeta.parsers.jsonld.parse_jsonld(g, res, getstream(source), args)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/parsers/jsonld.py", line 54, in parse_jsonld
    return parse_jsonld_data(g,res, data, args)
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/parsers/jsonld.py", line 95, in parse_jsonld_data
    data['@context'] = rewrite_context(data['@context'])
  File "/home/j.broeder/work/git/codemeta-harvester/env/lib/python3.8/site-packages/codemeta/parsers/jsonld.py", line 22, in rewrite_context
    raise Exception(f"Refusing to load non-authorized local context: {v}")
Exception: Refusing to load non-authorized local context: file:///tmp/repostatus.jsonld
-- end log --
[harvester error] Failed to consolidate metadata petrack
proycon commented 2 years ago

Thanks, that's a good example and shows of the .zenodo.json you mentioned.

It seems the failure is due to the query to the gitlab api being incorrect. I suspect the address was converted from a git+ssh address to https://jugit.fz-juelich.de:ped-dyn-emp/petrack, but that first colon should be a slash obviously, so this is indeed a bug (either here or probably in codemetapy)

Exception: Refusing to load non-authorized local context: file:///tmp/repostatus.jsonld

This was a temporary glitch in the git master tree and has been fixed in the meantime