When a document record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:
All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
Improvement suggestion
The editor prevent to submit non authorised characters or, ideally, automatically correct it.
Alternative:
OAI-PMH requests should not fail due to character encoding problems in a single record. Records should be checked for character encoding problems. Possible approaches are (1=worst ... 4=best):
During the OAI-PMH response: check each record for encoding problems and exclude it from the response, if needed
During the OAI-PMH response: check each record for encoding problems and automatically sanitize it, if needed, before including it in the response
During record creation: automatically sanitize the record before saving
During record creation: issue an error and prevent the record to be created (ckeck server-side/client-side implications)
How it works
When a
document
record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:Improvement suggestion
The editor prevent to submit non authorised characters or, ideally, automatically correct it.
Alternative: