internetarchive / iari

Import workflows for the Wikipedia Citations Database
GNU General Public License v3.0
12 stars 9 forks source link

as a patron I want the links in the Responsible AI to be found and analyzed correctly by IARI so I can trust it #860

Open dpriskorn opened 1 year ago

dpriskorn commented 1 year ago

https://internetarchive.github.io/iare/?url=https://arxiv.org/pdf/2210.02667.pdf "A Human Rights-Based Approach to Responsible AI"

dpriskorn commented 1 year ago

Here one missing URL is http://www.worldvaluessurvey.org/ on page 4 it is not present in the annotation links it is in the text but the regex does not extract it because it is missing scheme aka "https?://"

possible fix: add another regex that looks for www. at the start of links and let IARI/IARE guess the scheme.