soilwise-he / link-liveliness-assessment

MIT License
0 stars 1 forks source link

understand which record has bad link #27

Open pvgenuchten opened 3 weeks ago

pvgenuchten commented 3 weeks ago

currently we seem to not store which record has a bad link, only if the url is incorrect, which makes it hard to understand where an improvement is needed

suggestion:

vgole001 commented 2 weeks ago

in link extraction phase, append each link with a reference to its source

For each url extracted what makes sense to track as reference?

  1. Use urllib to extract information about the URL itself:
    • Domain
    • Path structure
    • other
  2. Get JSON context where the URL was found:
    • Record id
    • Title
    • Provider name
    • other

FYI @pvgenuchten

pvgenuchten commented 1 week ago

Relevant here is the record (Id or url) on which a certain link was identified, for example https://soilwise-he.containers.wur.nl/cat/collections/metadata:main/items/10.1007/698_2022_928