sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Need to make resumption token configurable in OAI-PMH driver #173

Closed jacobthill closed 2 years ago

jacobthill commented 2 years ago

At least one collection uses resumption tokens which are configurable in Sickle like records = sickle.ListRecords(resumptionToken='0mods_no_ocr'). We need to pull this field from the config when available and, if not in the config, continue behaving as is.

edsu commented 2 years ago

I think we discovered in huddle yesterday that if metadataPrefix is used, then hard coding the resumption token should not be needed?

https://api.qdl.qa/api/oaipmh?verb=ListRecords&metadataPrefix=mods_no_ocr

aaron-collier commented 2 years ago

@jacobthill do you think the metadataPrefix is sufficient here since so far QNL is the only provider we have with that resumptionToken concern?

edsu commented 2 years ago

Yes, I think it makes more sense from an OAI-PMH perspective for us to configure metadataPrefix in the catalog (using metadata_prefix key in the YAML) rather than hard coding a resumption token.

To get QNL to work we will need to adjust the dlme_airflow.drivers.OaiXmlSource to handle the mods xml, but we have #175 for that (which I picked up). The QNL catalog entry already is set to metadata_prefix: mods_no_ocr so I think this issue can be closed?

Ooops, I just noticed the question was asked of @jacobthill :-) Sorry for jumping in. The resumption tokens are meant to be opaque, and possible transient identifiers for result sets. So I think we would be better off using the metadata prefix instead as a pattern here.

jacobthill commented 2 years ago

Sorry for the delay, I agree with @edsu

aaron-collier commented 2 years ago

@edsu thank you, great info. closing. If we find we may need this, we can revisit later.