infrae / pyoai

The oaipmh module is a Python implementation of an "Open Archives$ Initiative Protocol for Metadata Harvesting"
http://pypi.python.org/pypi/pyoai
Other
83 stars 53 forks source link

How to create readers for other OAI Metadata Schemas? #49

Open bhavin2897 opened 2 years ago

bhavin2897 commented 2 years ago

I am trying to harvest datasets and metadata from OAI Servers.

I was successful to retrieve metadata from oai_dc, using available, oai_dc-reader.

How can I create a new reader, such that I can harvest GetRecord from other metadata schemas.

Here, I require metadata from DataCite.org which uses oai_datacite. Currently, I could create a MetadataReader, using normal XML Parser Syntax. But fail to parse or retrieve data.

oai_datacite_reader = MetadataReader(
fields={
    'title':       ('textList', '//resource/titles/title/text()'),
    'creator':     ('textList', '//resource/creator/creator/text()'),
    'subject':     ('textList', '//resource/subjects/subject/text()'),
    'description': ('textList', '//resource/descriptions/description/text()'),
    'publisher':   ('textList', '//resource/publisher/text()'),
    'contributor': ('textList', '//resource/contributors/contributor/text()'),
    'date':        ('textList', '//resource/dates/date/text()'),
    #'type':        ('textList', '//resource/type/text()'),
    'format':      ('textList', '//resource/format/text()'),
    'identifier':  ('textList', '//resource/identifier/text()'),
    #'source':      ('textList', '//resource/source/text()'),
    'language':    ('textList', '//resource/language/text()'),
    'relation':    ('textList', '//resource/relatedIdentifiers/relatedIdentifier/text()'),
    #'coverage':    ('textList', '//resource/coverage/text()'),
    'rights':      ('textList', '//resource/rights/text()'),
    'version':      ('textList', '//resource/version/text()'),
    'publicationYear': ('textList', '//resource/publicationYear/text()')
    },
    namespaces={'oai_datacite:' 'http://datacite.org/schema/kernel-4'}
)

All the field are returned empty. result:

{"title": [], "creator": [], "subject": [], "description": [], "publisher": [], "contributor": [], "date": [], "format": [], "identifier": [], "language": [], "relation": [], "rights": [], "version": [], "publicationYear": []}

Please give guidance to define a new field metadata reader.

Thanks in advance