#182: For PDF guess at Content-Type when none is supplied by probing Conent-Disposition filename

For the originally reported URL (https://www.nokia.com/phones/de_at/support/api/pdf/nokia-5310-user-guide), the following logs/header/response was observed:

CHARSET: No Content-Type header detected for https://www.nokia.com/phones/de_at/support/api/pdf/nokia-5310-user-guide, adding one. background.js:655:28 CHARSET: Detected base type was text/html

HTTP/2 200 OK server: nginx/1.23.1 x-frame-options: SAMEORIGIN content-language: de-AT content-disposition: inline; filename="user-guide-nokia-5310-user-guide.pdf" x-varnish: 59445958 accept-ranges: bytes date: Thu, 05 Jan 2023 05:44:28 GMT content-length: 261166 set-cookie: nok_ip_locale=US; expires=Sat, 04-Feb-2023 05:44:28 GMT; path=/; secure; HttpOnly set-cookie: nok_ip_region=americas; path=/; secure; HttpOnly strict-transport-security: max-age=31536000 X-Firefox-Spdy: h2

GET /phones/de_at/support/api/pdf/nokia-5310-user-guide HTTP/2 Host: www.nokia.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Referer: https://github.com/wingman-jr-addon/wingman_jr/issues/182 Connection: keep-alive Cookie: nok_ip_locale=US; nok_ip_region=americas; AKA_A2=A Upgrade-Insecure-Requests: 1 Sec-Fetch-Dest: document Sec-Fetch-Mode: navigate Sec-Fetch-Site: cross-site TE: trailers

So unfortunately not a lot to indicate that this was actually a PDF rather than something we should be scanning. This starts getting towards the same issues with charset detection where you have to probe the actual content rather than just using the header.

As a pragmatic approach, fallback to the Content-Disposition's filename's extension so we can keep trying to use the header, and use this to add a PDF special case.

Ideal? Hardly.

wingman-jr-addon / wingman_jr

#182: For PDF guess at Content-Type when none is supplied by probing Conent-Disposition filename #183