metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Add SitemapReader originally developed in OERSI #469

Open fsteeg opened 2 years ago

fsteeg commented 2 years ago

Reads sitemap from URL, sends each loc URL to the receiver.

e.g. "https://hoou.de/sitemap.xml" | read-sitemap | open-http ... in a Flux workflow to process every document linked in the sitemap.

Supports paging via from= query string parameter in the sitemap URL.

Assigning @dr0i for code review due to the (albeit loose) paging relation to #464.

We don't have a dedicated issue for this, maybe @TobiasNx could do functional review here?

fsteeg commented 2 years ago

Discussed in our planning meeting: we're putting this on hold to investigate if we actually need this kind of specific module for reading sitemaps, or if we can build something based on existing modules and the upcoming paging support (https://github.com/metafacture/metafacture-core/issues/464).

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication