inveniosoftware / invenio-vocabularies

Invenio module for managing vocabularies.
https://invenio-vocabularies.readthedocs.io
MIT License
2 stars 42 forks source link

Importing the funders-zenodo-ror-data as funders.yaml crashes #218

Open chriz-uniba opened 2 years ago

chriz-uniba commented 2 years ago

Package version (if known): invenio-app-rdm 9.1.3; invenio-vocabularies 0.11.6

Describe the bug

When downloading and converting the https://zenodo.org/api/files/25d4f93f-6854-4dd4-9954-173197e7fad7/v1.1-2022-06-16-ror-data.zip into a funders.yaml and then trying to importing the funders vocabulary using a vocabularies-future.yaml and this funders.yaml we get an TypeError and a Traceback.

Steps to Reproduce

Following the documentation here: https://inveniordm.docs.cern.ch/customize/vocabularies/funding/#funders-ror

curl https://zenodo.org/api/files/25d4f93f-6854-4dd4-9954-173197e7fad7/v1.1-2022-06-16-ror-data.zip -o funders.zip
# invenio vocabularies convert --vocabulary funders --origin funders.zip --target funders.yaml
Vocabulary funders converted. Total items 102742.
102742 items succeeded
0 contained errors
0 were filtered.
# cat vocabularies-future.yaml
names:
  readers:
    - type: yaml
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 126, in import_vocab
    config = get_config_for_ds(vocabulary, filepath, origin)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 45, in get_config_for_ds
    config["readers"][0]["args"]["origin"] = origin
TypeError: 'NoneType' object is not subscriptable

Expected behavior

Importing should work (although we probably identified four broken isni within this funders.yaml)

Additional Notes

Same, if " are added around the file-names.

# invenio vocabularies import --vocabulary funders --filepath "./vocabularies-future.yaml" --origin "./funders.yaml" 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 126, in import_vocab
    config = get_config_for_ds(vocabulary, filepath, origin)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 45, in get_config_for_ds
    config["readers"][0]["args"]["origin"] = origin
TypeError: 'NoneType' object is not subscriptable
chriz-uniba commented 2 years ago

There is an open PR for the documentation: https://github.com/inveniosoftware/docs-invenio-rdm/pull/398/files

cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity

names needs to be funders and the origin needs to be given.

Then you can use the following and it seems to work[^1]

invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml

When removing the origin in the import then it is crashing

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType

When setting a wrong yaml for orgin we get the following

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin app_data/vocabularies/subjects_oecd_fos.yaml 
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]
FundersServiceWriter: [{'ValidationError': {'name': ['Missing data for required field.'], 'scheme': ['Unknown field.'], 'subject': ['Unknown field.']}}]

When setting a wrong yaml in vocabularies-future.yaml

cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "app_data/vocabularies/subejects_oecd_fos.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity

and calling with the right origin - it seems to work[^1]

# invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml --origin funders.yaml

[^1]: Note: It seems to work - I do not get an immediate error message - I do not let the whole thing run through - since it takes some while to be completed.

Samk13 commented 2 years ago

in vocabularies-future.yaml orgin should contain the full relative path not only the file name:

funders:
  readers:
    - type: yaml
      args:
          origin: "app_data/vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
awards:
  readers:
    - type: yaml
      args:
          origin: "app_data/vocabularies/awards.yaml"
  writers:
    - type: awards-service
      args:
        service_or_name: awards
        identity: system_identity

funders schema should look like that:

- id: 202100-2585
  country: SE
  name: name
  title:
    en: name

the command is:

invenio vocabularies import --vocabulary funders --filepath ./vocabularies-future.yaml

Please follow this recipe and let me know If it works

chriz-uniba commented 2 years ago

My folder structure:

ls
app_data  docker                   docker-compose.yml  docker-services.yml  funders.zip  logs     Pipfile.lock  static     vocabularies-future.yaml
assets    docker-compose.full.yml  Dockerfile          funders.yaml         invenio.cfg  Pipfile  README.md     templates

So the file vocabularies-future.yaml and the funders.yaml are lying at the same level. So I guess the relative path should be right.

Samk13 commented 2 years ago

and still not working? How about putting future and funders.yaml inside app_data and app_data/vocabulary respectively and adjust the paths? will still not work?

chriz-uniba commented 2 years ago

So if we change the paths:

[root@d94b497621ee app_data]# ls
README.md  vocabularies  vocabularies-future.yaml  vocabularies.yaml

setting the path relative to function call (src calls the import --- what I have never done so far, because for me that doesn't make any sense)

[root@d94b497621ee app_data]# cat vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "app_data/vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
[root@d94b497621ee app_data]# ls vocabularies/
affiliations_ror.yaml  funders.yaml  subjects_oecd_fos.yaml
[root@d94b497621ee src]# invenio vocabularies import --vocabulary funders --filepath ./app_data/vocabularies-future.yaml 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType

alternatively: when setting the path relative to vocabularies-future.yaml (something we have done at several other places successfully already)

[root@d94b497621ee src]# cat app_data/vocabularies-future.yaml 
funders:
  readers:
    - type: yaml
      args:
          orgin: "./vocabularies/funders.yaml"
  writers:
    - type: funders-service
      args:
        service_or_name: funders
        identity: system_identity
[root@d94b497621ee src]# invenio vocabularies import --vocabulary funders --filepath ./app_data/vocabularies-future.yaml 
Traceback (most recent call last):
  File "/usr/bin/invenio", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 127, in import_vocab
    success, errored, filtered = _process_vocab(config, num_samples)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/cli.py", line 81, in _process_vocab
    for result in ds.process():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 50, in process
    for stream_entry in self.read():
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 85, in read
    yield from pipe_gen(read_gens)
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/datastreams.py", line 70, in pipe_gen
    for item in current_gen_func(piped_item):
  File "/usr/lib/python3.9/site-packages/invenio_vocabularies/datastreams/readers.py", line 50, in read
    with open(self._origin, self._mode) as file:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Samk13 commented 2 years ago

you should check your paths maybe you are adding ./ or you put the path in a string "app_data/vocabularies-future.yaml" in the command, double-check your paths there are no other issues other than that I think.

chriz-uniba commented 2 years ago

okay - I will not go on with testing paths here but sum up: