inveniosoftware / invenio-vocabularies

Invenio module for managing vocabularies.
https://invenio-vocabularies.readthedocs.io
MIT License
2 stars 40 forks source link

datastreams: add ROR HTTP reader (later needed for funders) #315

Closed ptamarit closed 1 month ago

ptamarit commented 1 month ago

:heart: Thank you for your contribution!

Description

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.
ptamarit commented 1 month ago

I've tested on a clean install with pipenv run invenio vocabularies import --vocabulary funders --origin ror-http and it works. It would be nice to include some terminal output when the file starts downloading and is completed, since at the moment it looks like it's hanging. There is a bit of a generic issue with datastreams not providing feedback when they are processing, but that's probably beyond the scope of this improvement.

Since we might not want to use click.secho outside of cli.py, I tried to use current_app.logger.info. However, since most InvenioRDM installations will have the default log level of warning, most users won't see any extra information. Not sure what the right solution would be.

slint commented 1 month ago

Since we might not want to use click.secho outside of cli.py, I tried to use current_app.logger.info. However, since most InvenioRDM installations will have the default log level of warning, most users won't see any extra information. Not sure what the right solution would be.

We also plan to have a separate logger that would integrate with the Jobs system (i.e. so that one can see logs under the Run admin UI view). I would leave logging as is for now, and see if we later on implement passing a logger parameter when running a datastream which would also log to stdout everything if called from the CLI.

ptamarit commented 1 month ago

FYI, I just removed the commit with the message datastreams: use ROR HTTP reader for funders which modifies invenio_vocabularies/contrib/funders/datastreams.py as follows:

 DATASTREAM_CONFIG = {
     "readers": [
+        {"type": "ror-http"},
         {
             "type": "zip",
             "args": {

Once inveniosoftware/invenio-app-rdm#2674 is merged, we could add back this commit if we want the downloading of the ZIP file to be the default behavior. I created #320 with the aforementioned change.