RSS Discovery Engine does HTTP requests in several different ways:
1) aiohttp
2) the get_response_content function
3) the get_request function
4) whatever feedparser uses internally
Only get_request was setting the User-Agent header. Not setting a user agent (or using the default one from the lib) is a fairly reliable way to trip bad-bot-detection alarms; I am reasonably sure this is part of why it struggles to work with HN.
This changeset ensures that it always sets the header, however the request is made.
Driveby: order some imports and remove an unused one.
RSS Discovery Engine does HTTP requests in several different ways:
1)
aiohttp
2) theget_response_content
function 3) theget_request
function 4) whateverfeedparser
uses internallyOnly
get_request
was setting theUser-Agent
header. Not setting a user agent (or using the default one from the lib) is a fairly reliable way to trip bad-bot-detection alarms; I am reasonably sure this is part of why it struggles to work with HN.This changeset ensures that it always sets the header, however the request is made.
Driveby: order some imports and remove an unused one.