inveniosoftware / invenio-vocabularies

Invenio module for managing vocabularies.
https://invenio-vocabularies.readthedocs.io
MIT License
2 stars 40 forks source link

Funders: fix: exclude ROR schema v2 json #309

Closed ptamarit closed 1 month ago

ptamarit commented 1 month ago

:heart: Thank you for your contribution!

Partially fixes #305

Description

:books: Quoting the ROR data dump documentation:

Beginning with release v1.45 on 11 April 2024, data releases contain JSON and CSV files formatted according to both schema v1 and schema v2. Version 2 files have _schema_v2 appended to the end of the filename, e.g., v1.45-2024-04-11-ror-data_schema_v2.json. In order to maintain compatibility with previous release, version 1 files have no version information in the filename, e.g., v1.45-2024-04-11-ror-data.json.

:bomb: This breaks the funders convert script, which goes through both v1 and v2 files:

$ invenio vocabularies convert -v funders -o v1.46-2024-05-02-ror-data.zip -t output.yaml

[...]
RORTransformer: Name not found in ROR entry.
RORTransformer: Name not found in ROR entry.
RORTransformer: Name not found in ROR entry.
Vocabulary funders converted. Total items 218710. 
109355 items succeeded
109355 contained errors
0 were filtered.

:adhesive_bandage: This pull request:

:heavy_check_mark: The funders convert script then works as expected:

$ invenio vocabularies convert -v funders -o v1.46-2024-05-02-ror-data.zip -t output.yaml
Vocabulary funders converted. Total items 109355. 
109355 items succeeded
0 contained errors
0 were filtered.

:information_source: Remark: if and when we move to v2, the regex can easily be changed to "regex": "_schema_v2\\.json$".

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.