Open pkiraly opened 2 years ago
@mielvds I have created an XML reader and a new option --recordAddress which is an XPath expression to address the individual records. I created a reader and writer package and moved relevant classes there.
--recordAddress
reader
writer
Here is an example for usage,
Input file:
<?xml version="1.0" encoding="UTF-8"?> <metadata xmlns:europeana="http://www.europeana.eu/schemas/ese/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:dc="http://purl.org/dc/elements/1.1/"> <record> <dc:format>application/pdf</dc:format> <dc:identifier type="providerId">99900556</dc:identifier> <dc:identifier type="providerItemId">M.ch.f.91</dc:identifier> <dc:identifier type="URN">urn:nbn:de:bvb:20-mchf91-3</dc:identifier> <dc:type type="document">Einfache Handschrift</dc:type> <dc:date xml:lang="de">1391</dc:date> <dc:date xml:lang="de">1410</dc:date> <dcterms:created xml:lang="de">1391-1410 (14./15. Jahrhundert)</dcterms:created> <dcterms:location resource="http://d-nb.info/gnd/4067037-5">Würzburg</dcterms:location> <dc:title>Lectura super quinto libro Decretalium</dc:title> ... </record> <record> <dc:format>application/pdf</dc:format> <dc:identifier type="providerId">99900556</dc:identifier> <dc:identifier type="providerItemId">I.t.f.CCLXVI</dc:identifier> <dc:identifier type="URN">urn:nbn:de:bvb:20-itfcclxvi-3</dc:identifier> <dc:type type="document" resource="http://d-nb.info/gnd/4027041-5">Inkunabel</dc:type> <dc:date xml:lang="de">1476</dc:date> ... </record> </metadata>
./mqa --schema dc-schema.yaml \ --input sample.xml \ --recordAddress '//oai:record' \ --output result.csv \ --measurements measurements.json \ --outputFormat csv
The XPath should contain qualified elements, and the namespace prefix should be part of the schema:
format: xml fields: ... namespaces: doc: http://www.lyncode.com/xoai foaf: http://xmlns.com/foaf/0.1/ europeana: http://www.europeana.eu/schemas/ese/ dcterms: http://purl.org/dc/terms/ dc: http://purl.org/dc/elements/1.1/ oai: http://www.openarchives.org/OAI/2.0/
very nice!
@mielvds I have created an XML reader and a new option
--recordAddress
which is an XPath expression to address the individual records. I created areader
andwriter
package and moved relevant classes there.Here is an example for usage,
Input file:
The XPath should contain qualified elements, and the namespace prefix should be part of the schema: