wfleming / fourohfourfinder

MIT License
2 stars 0 forks source link

Handle non-HTML resources #1

Closed wfleming closed 8 years ago

wfleming commented 8 years ago

Currently the code blindly assumes everything linked to is an HTML page & tries to parse it as such. As a result fetches crash on non-HTML resource, e.g. PDF files.

PageAnalyzer should check the content type and only parse HTML files. Other types it should just return based on status code.