wo / paperscraper

tracking and parsing new philosophy papers on the internet
9 stars 4 forks source link

scraper crashes following link with invalid url #72

Open wo opened 8 years ago

wo commented 8 years ago

http://www.simonegozzano.com/?page_id=9 contains a single link to what appears to be a pdf file: http://http://www.simonegozzano.com/wp-content/uploads/2013/05/BRECURRI.pdf but util.request_url(url) doesn't catch the exception urllib3 throws because of the double 'http://'.

This seems to happen from source page http://lsa.umich.edu/philosophy/ (why is that even in the list?).