Handle non-HTML resources

Currently the code blindly assumes everything linked to is an HTML page & tries to parse it as such. As a result fetches crash on non-HTML resource, e.g. PDF files.

PageAnalyzer should check the content type and only parse HTML files. Other types it should just return based on status code.