reinventalbany / esd-crawl

Web crawler to find data on Empire State Development site
MIT License
0 stars 0 forks source link

find broken links to PDFs #47

Closed afeld closed 1 year ago

afeld commented 1 year ago

TBD how we want those reported.

afeld commented 1 year ago

Broken PDF links appear to (indirectly) redirect to the homepage.

$ curl -I -L https://esd.ny.gov/broken.pdf
HTTP/1.1 301 Moved Permanently
Date: Thu, 17 Nov 2022 03:25:23 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: keep-alive
X-Content-Type-Options: nosniff
Location: https://cdn.esd.ny.gov/broken.pdf
Cache-Control: max-age=1209600
Expires: Wed, 30 Nov 2022 21:56:05 GMT
X-Request-ID: v-77d9358a-65f9-11ed-826c-bb790a478789
Via: varnish
X-Cache: HIT
X-Cache-Hits: 3
CF-Cache-Status: HIT
Server: cloudflare
CF-RAY: 76b55bdfcada8c3b-EWR

HTTP/2 302
content-length: 141
content-type: text/html; charset=UTF-8
location: https://esd.ny.gov
server: Microsoft-IIS/10.0
x-powered-by: ASP.NET
x-frame-options: SAMEORIGIN
date: Thu, 17 Nov 2022 03:25:23 GMT

HTTP/1.1 200 OK
Date: Thu, 17 Nov 2022 03:25:24 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
X-Drupal-Cache: MISS
X-Frame-Options: SAMEORIGIN
Content-Language: en
X-Content-Type-Options: nosniff
Permissions-Policy: interest-cohort=()
X-Generator: Drupal 7 (http://drupal.org)
Link: <https://esd.ny.gov/>; rel="canonical",<https://esd.ny.gov/>; rel="shortlink"
Cache-Control: public, max-age=86400
Last-Modified: Wed, 16 Nov 2022 22:45:06 GMT
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Vary: Cookie,Accept-Encoding
X-Request-ID: v-51288b6e-6600-11ed-b4a3-f7404e49308f
X-AH-Environment: prod
Via: varnish
X-Cache: MISS
CF-Cache-Status: HIT
Server: cloudflare
CF-RAY: 76b55be10df08c3b-EWR