scieloorg / oai-pmh

Provedor da dados OAI-PMH para Rede SciELO
BSD 2-Clause "Simplified" License
2 stars 4 forks source link

Missing `xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"` #31

Open robertatakenaka opened 1 year ago

robertatakenaka commented 1 year ago

We've noticed a discrepancy regarding your Scielo Brasil repository response and the expected response according to OAI Protocol. According to guidelines in https://www.openarchives.org/OAI/openarchivesprotocol.html#Record (extract below), the response XML sent by the repository should include the following attribute in the metadata part of each record: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

The following example shows an XML-encoding of a record and its components:

the metadata part. This consists of a single root tag - in the example the tag oai_dc:dc - with the nested tags belonging to the corresponding metadata format - in the example, Dublin Core elements such as dc:title. Note that the root tag within the metadata part includes a number of attributes that are common to all XML documents that use namespaces and schema validity: namespace declarations -- the declarations of the namespaces used within the metadata part, each of which is prefixed with xmlns. Namespace declarations within the metadata part fall into two categories: metadata format specific namespace(s) - every metadata part must include one or more xmlns prefixed attributes that define the correspondence between a metadata format prefix -- e.g. dc -- and the namespace URI (as defined by the XML namespace specification ) of the respective metadata format. Some metadata formats employ tags from multiple namespaces, requiring multiple xmlns prefixed attributes -- in the example, there are declarations for both oai_dc and dc. xml schema namespace - every metadata part must include the attribute xmlns:xsi, the value of which must always be the URI shown in the example, which is the namespace URI for XML schema.

We've confirmed this occurs in most repositories' responses (we'll use Scielo Spain below as example). When doing a ListRecords request to Scielo Spain repository (https://scielo.isciii.es/oai/scielo-oai.php?): Example: https://scielo.isciii.es/oai/scielo-oai.php?verb=ListRecords&set=0213-1285&from=2022-11-30&metadataPrefix=oai_dc We get the following response (XMLSchema-instance included in metadata element, in xmlns:xsi):

<ListRecords>
  <record>
    <header>
      <identifier>oai:scielo:S0213-12852022000300001</identifier>
      <datestamp>2022-11-30</datestamp>
      <setSpec>0213-1285</setSpec>
    </header>
    <metadata>
      <oai-dc:dc xmlns:oai-dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

However, when we do a ListRecords request to Scielo Brasil repository (https://oaipmh.scielo.org/br/): Example: https://oaipmh.scielo.org/br/?verb=ListRecords&set=0102-8650&from=2022-09-30&metadataPrefix=oai_dc We get the following response (XMLSchema-instance not included in metadata element):

<ListRecords>
  <record>
    <header>
      <identifier>oai:scielo:S0102-86502022000700200</identifier>
      <datestamp>2022-10-04</datestamp>
    </header>
    <metadata>
      <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

This makes it impossible for parsers that rely on a correct XML document to retrieve data from Scielo Brasil repository.

Is it possible to update Scielo Brasil to include the required xmlns:xsi element on each record's metadata element?

robertatakenaka commented 1 year ago

@gitnnolabs Fazer a tentativa de registrar os namespaces:

Exemplo:


namespaces = {}
namespaces['xml'] = 'http://www.w3.org/XML/1998/namespace'
namespaces['xlink'] = 'http://www.w3.org/1999/xlink'
namespaces['mml'] = 'http://www.w3.org/1998/Math/MathML'

for namespace_id, namespace_link in namespaces.items():
    etree.register_namespace(namespace_id, namespace_link)