zenodo / zenodo-rdm

Zenodo, powered by InvenioRDM
https://zenodo.org
GNU General Public License v2.0
41 stars 25 forks source link

Valid ISO 639-2 codes for certain languages are rejected by REST API as invalid #802

Open rhigman opened 5 months ago

rhigman commented 5 months ago

Related to zenodo/zenodo#1483.

The REST API documentation for language metadata states "Specify the main language of the record as ISO 639-2 or 639-3 code, see Library of Congress ISO 639 codes list."

The Library of Congress ISO 639 codes list linked in the documentation includes both the "terminological" and "bibliographic" versions of the codes. For example, the ISO 639-2 code for German is given as both ger (bibliographic) and deu (terminological).

However, attempting to submit metadata including language: ger returns the following error:

{'status': 400, 'message': 'Invalid value ger.'}

As large amounts of the data submitted to Zenodo will naturally be bibliographic in nature, it would make sense to support these versions of the codes.

If the validation is reliant on pycountry, as is the case for the similar module quoted below, pycountry does support the bibliographic versions.

https://github.com/zenodo/zenodo/blob/482ee72ad501cbbd7f8ce8df9b393c130d1970f7/zenodo/modules/records/serializers/schemas/common.py#L389

Incidentally, submitting a completely incorrect language code (e.g. language: german) gives a more helpful error message than submitting a valid albeit non-terminological one:

{'status': 400, 'message': 'A validation error occurred.', 'errors': [{'field': 'metadata.language', 'messages': ['Language must be either ISO-639-1 or 639-2 compatible.']}]}