prof18 / RSS-Parser

A Kotlin Multiplatform library to parse a RSS Feed
Apache License 2.0
524 stars 128 forks source link

Unexplained 403 fetching some RSS feeds #110

Closed christate closed 1 year ago

christate commented 1 year ago

Describe the bug Some fetches by RssParser - in particular feeds hosted by Buzzsprout - get unexplained 403 responses, even though browsers or even naïve curl fetches work just fine, even with all non-Host: headers suppressed. Unfortunately, the library as provided doesn't offer anything in the way of instrumentation hooks, header configuration, or debug-assistance tracing/logging to help diagnose and address the issue.

The link of the RSS Feed Two major podcasts that are hosted on Buzzsprout are both showing the same behavior:

Maintenance Phase: https://feeds.buzzsprout.com/1411126.rss

You're Wrong About: https://feeds.buzzsprout.com/1112270.rss

Sample output of curl on the "You're Wrong About" feed:

$ curl -vvv -H"User-Agent:" -H"Accept:" https://feeds.buzzsprout.com/1112270.rss
*   Trying 104.19.159.48:443...
* Connected to feeds.buzzsprout.com (104.19.159.48) port 443 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server accepted http/1.1
> GET /1112270.rss HTTP/1.1
> Host: feeds.buzzsprout.com
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Tue, 07 Mar 2023 22:46:58 GMT
< Content-Type: text/xml; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< X-XSS-Protection: 0
< X-Content-Type-Options: nosniff
< X-Download-Options: noopen
< X-Permitted-Cross-Domain-Policies: none
< Referrer-Policy: strict-origin-when-cross-origin
< Cache-Control: max-age=21600, public
< ETag: W/"2a1545b24112259eaf705b3554e4e8a8"
< X-Request-Id: 73a9201a-0c81-4e92-bace-2db36303f95c
< X-Runtime: 0.053443
< Strict-Transport-Security: max-age=63072000; includeSubDomains
< Vary: Origin
< CF-Cache-Status: HIT
< Age: 19593
< Server: cloudflare
< CF-RAY: 7a4660a7185ace3c-SJC
<
<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet href="https://feeds.buzzsprout.com/styles.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
  <atom:link href="https://feeds.buzzsprout.com/1112270.rss" rel="self" type="application/rss+xml" />
  <atom:link href="https://pubsubhubbub.appspot.com/" rel="hub" xmlns="http://www.w3.org/2005/Atom" />
  <title>You&#39;re Wrong About</title>
....
christate commented 1 year ago

FYI I'm seeing these 403s using RSS-Parser 5.0.2 via Gradle import from MavenCentral.

christate commented 1 year ago

I've decided to close this as not-a-bug, since RSS-Parser already provides API for the app to configure the underlying OkHttpClient however it needs to.