soilwise-he / link-liveliness-assessment

MIT License
0 stars 1 forks source link

python linkchecker component #1

Open pvgenuchten opened 2 months ago

pvgenuchten commented 2 months ago

The python linkchecker is a basic checker which runs through a series of webpages identifies links and tries to resolve them, it reports in a number of potential output formats (xml, sitemap, sql)

The checker can run as a ci-cd script in a container

docker run --rm -it -u $(id -u):$(id -g) ghcr.io/linkchecker/linkchecker:latest --verbose https://www.example.com

results can be written to a location to be picked up by a process, for example a set of sql statements to be run against a DB

pvgenuchten commented 2 months ago

we already identified some challenges with this tool (and other tools):

example

linkchecker https://soilwise-he.containers.wurnet.nl/cat/collections/metadata:main/items?f=html --verbose --check-extern
pvgenuchten commented 2 months ago

since the soilwise catalogue is not online yet, suggestion is to use a catalogue of ejpsoil

https://catalogue.ejpsoil.eu/collections/metadata:main/items?f=html

not sure if linkchecker properly fetches each result of the paginated search result, else maybe an option to check links per page

etc...

Goal is to: