sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

xml driver needs paging #428

Open jacobthill opened 1 year ago

jacobthill commented 1 year ago

The xml driver currently only harvests the first page of metadata. Shahre Farang is a collection with multiple pages e.g. https://shahrefarang.com/en/feed/?paged=2

Newcastle also needs xml paging

edsu commented 1 year ago

Since they are using Atom for linking, it would be great if they could use paging by adding a <link rel=next ..> like:

<atom:link href="https://shahrefarang.com/en/feed/?paged=2" rel="next" type="application/rss+xml" />

Then our XML driver could follow it's nose to the next URL if paging is on? Otherwise I guess we could add some URL pattern to the intake configuration.

Of course there would be work to do in the driver to follow the link, whichever way we choose to go.