matrix-org / go-neb

Extensible matrix bot written in Go
Apache License 2.0
284 stars 91 forks source link

Some valid atom and rss feeds are not recognized #187

Open heyakyra opened 7 years ago

heyakyra commented 7 years ago

This atom feed is valid:

https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.qdep.org%2Ffeed%2Fatom%2F

but returns this error:

HTTP 500: Failed to register service: Failed to read URL http://www.qdep.org/feed/atom/: Failed to detect feed type

This rss feed seems to also be valid:

https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.qdep.org%2Ffeed%2F

but returns the same error

kegsay commented 7 years ago

We use https://github.com/mmcdole/gofeed to parse Atom/RSS feeds. I'll bring it up with them.

mmcdole commented 7 years ago

I discussed this in more detail in https://github.com/mmcdole/gofeed/issues/75 but, just wanted to circle back here.

It would appear that the server in question is behind an Incapsula WAF, and their security settings are blocking both 'curl' requests, and requests made by golang's http.client (with it's default settings / user agent).

kegsay commented 7 years ago

As @mmcdole states, it looks like this site is doing User-Agent sniffing and rejecting bots. We actually set our own User-Agent already sooo.. I don't think there's much we can do about this.

heyakyra commented 7 years ago

Than's for the update. Would this also be the reason that a Feedburner version of the feed gets rejected as well? At first the feed is accepted as valid, but come to check back on it later, and it has a red error icon next to it: https://feeds.feedburner.com/QueerDetaineeEmpowermentProject

This also happens if you include the feed type as well, such as https://feeds.feedburner.com/QueerDetaineeEmpowermentProject?format=atom

paboum commented 1 year ago

Also doesn't work for: https://github.com/matrix-org/go-neb/commits/master.atom