kokarare1212 / librespot-python

Open Source Spotify Client
http://librespot-python.rtfd.io
Apache License 2.0
230 stars 43 forks source link

[BUG] Podcast episodes no longer load, missing external_playback_url in metadata #135

Closed Yetangitu closed 2 years ago

Yetangitu commented 2 years ago

Since a few days it is no longer possible to download podcast episodes through librespot due to a missing metadata item (external_playback_url):

https://github.com/Yetangitu/Spodcast/issues/13

The problem is most likely caused by some missing scope or other authorisation-related item in the authorisation token, the question is - which? A similar problem seems to exist in the rust-based version (https://github.com/librespot-org/librespot/issues/818) where the absence of the required URL in the stream is also mentioned.

I do notice that the metadata returned by Spotify to the web based player is contradictory in that it claims an episode to not be externally hosted while serving an externally hosted stream:

{
  "episodes" : [ {
   ...
   ...
    "external_playback_url" : "https://traffic.megaphone.fm/GLT8613565834.mp3?updated=1655823165",
   ...
   ...
    "is_externally_hosted" : false,

This change seems to have taken place on or around the 16th of June since that is the last day things worked as intended.

I notice the token generated by the web player is quite a bit longer than the one generated by librespot suggesting it contains some extra authorisations missing from the latter. The question is, which?

kokarare1212 commented 2 years ago

I think there is a good chance of that. However, I think it is also important to note that the API endpoints used by librespot-python are different.

Yetangitu commented 2 years ago

It used to work with the existing endpoints, now it no longer does - the endpoint still returns data but this no longer contains playable streams, only previews. Here's what used to be returned for Spotify-hosted podcasts:

API call to https://api-partner.spotify.com/pathfinder/v1/query?operationName=getEpisode&variables={"uri":"spotify:episode:{epidodeId}"}...

      "audio": {
        "items": [
          {
            "url": "https://anon-podcast.scdn.co/d5fe1647e3fb98c7f074cf87c0a6fe6fbd3848b1",
            "format": "AAC_24",
            "fileId": "d5fe1647e3fb98c7f074cf87c0a6fe6fbd3848b1",
            "externallyHosted": false
          },
          {
            "url": "https://anon-podcast.scdn.co/47e7f56afca4719d33094c81dc826a63668a23a6",
            "format": "MP4_128",
            "fileId": "47e7f56afca4719d33094c81dc826a63668a23a6",
            "externallyHosted": false
          },
          {
            "url": "https://anon-podcast.scdn.co/ec2cd6625026c45273e8776c3779b80afc28ca36",
            "format": "MP4_128_DUAL",
            "fileId": "ec2cd6625026c45273e8776c3779b80afc28ca36",
            "externallyHosted": false
          },
          {
            "url": "https://anon-podcast.scdn.co/226e1bd59da25e023cbdb450485e43b6b1b9878b",
            "format": "OGG_VORBIS_96",
            "fileId": "226e1bd59da25e023cbdb450485e43b6b1b9878b",
            "externallyHosted": false
          }
        ]
      }

Notice the use of the anon-podcast.scdn.co domain for serving these streams.

Here's what used to be returned for externally-hosted podcasts:

  "audio": {
    "items": [
      {
        "url": "https://anon-podcast.scdn.co/b16d6ecd88942a0fac5dea4e560fc553e7ba90fb",
        "format": "AAC_24",
        "fileId": "b16d6ecd88942a0fac5dea4e560fc553e7ba90fb",
        "externallyHosted": false
      },
      {
        "url": "https://anon-podcast.scdn.co/63437b0b71f6d332555b041441d760641e00dba8",
        "format": "MP4_128_DUAL",
        "fileId": "63437b0b71f6d332555b041441d760641e00dba8",
        "externallyHosted": false
      },
      {
        "url": "https://anon-podcast.scdn.co/30f1f2170606c5bc6ccd367ea163b9a1039336af",
        "format": "MP4_128",
        "fileId": "30f1f2170606c5bc6ccd367ea163b9a1039336af",
        "externallyHosted": false
      },
      {
        "url": "https://anon-podcast.scdn.co/ce101e45379048a8d41d21e352dfc076aa1cdbf1",
        "format": "OGG_VORBIS_96",
        "fileId": "ce101e45379048a8d41d21e352dfc076aa1cdbf1",
        "externallyHosted": false
      },
      {
        "url": "https://www.podtrac.com/pts/redirect.mp3/pdst.fm/e/pdst.fm/e/traffic.megaphone.fm/BENT9388265894.mp3?updated=1645154429",
        "format": "UNKNOWN",
        "fileId": null,
        "externallyHosted": true
      }
    ]
  }

Notice the last object in the array which contains an external url.

This is what is returned now for the same API call:

      "audio": {
        "items": [
          {
            "url": "https://p.scdn.co/mp3-preview/cbe2227b6c1ad338d86bcdb5ef9d4770cfe86941",
            "format": "AAC_24",
            "fileId": "cbe2227b6c1ad338d86bcdb5ef9d4770cfe86941",
            "externallyHosted": false
          },
          {
            "url": "https://p.scdn.co/mp3-preview/88f8d32962f3e9695af48f80ff5ddca43871b98d",
            "format": "MP4_128",
            "fileId": "88f8d32962f3e9695af48f80ff5ddca43871b98d",
            "externallyHosted": false
          },
          {
            "url": "https://p.scdn.co/mp3-preview/c5adb66015480fc18305dd6c248dec0bf4718c2a",
            "format": "MP4_128_DUAL",
            "fileId": "c5adb66015480fc18305dd6c248dec0bf4718c2a",
            "externallyHosted": false
          },
          {
            "url": "https://p.scdn.co/mp3-preview/4b1203a2153ef2bf642526a0ee1b53afd5c9ba47",
            "format": "OGG_VORBIS_96",
            "fileId": "4b1203a2153ef2bf642526a0ee1b53afd5c9ba47",
            "externallyHosted": false
          }
        ]
      }

The domain used for podcast streams (anon-podcast.scdn.co) does not exist any more:

$ host anon-podcast.scdn.co
Host anon-podcast.scdn.co not found: 3(NXDOMAIN)

...so it does look like something has changed on Spotify's side. The returned data now only contains 'preview' URLs (which all return HTTP/1.1 404 Not Found when polled using the token from the web player, I guess these need special treatment in some way - are they Widevine-encumbered streams?). The actual playable stream has moved and is now found in the output from GET /v1/episodes/{episodeId}, in external_playback_url. Switching out the token returned by librespot for the one returned by the web player makes this url appear in the output, this seems to be the only difference:

Using a web player generated token:

curl -s 'https://api.spotify.com/v1/episodes?ids=5neDqDVLOPbISIt2IYXlc8' ' -H 'authorization: Bearer WARNING_HERE_BE_A_VERY_LONG_BEARER_TOKEN' |jq '.'
{
  "episodes": [
    {
      "audio_preview_url": "https://p.scdn.co/mp3-preview/8ce935cff039360f8c8e6bad7592641896139643",
      "content_type": "PODCAST_EPISODE",
      "description": "bla bla bla",
      "duration_ms": 1876349,
      "explicit": false,
      "external_playback_url": "https://traffic.megaphone.fm/GLT2438388645.mp3?updated=1652802576",
   ...
   ...

The external_playback_url is playable without any further problems but... it only appears when using a web player provided token. Using a librespot generated token produces the same output _minus the external_playbackurl, i.e. it does not provide a playable stream.

kokarare1212 commented 2 years ago

Apparently Spotify had added something called ClientToken to use. https://github.com/librespot-org/librespot-java/commit/08b7890ed0fadc072052958945ac64e784232ac5

kokarare1212 commented 2 years ago

4c4c5642d7e97cf56343d6c385af3c64796711ea...560c5000a8c8509c028951b055efd8f93ff02d75 This commit should have fixed it.

Yetangitu commented 2 years ago

It does not seem to work just yet, the external_playback_url is not returned when using a token generated by librespot-python while it is returned for exactly the same query when using one generated by the web client.

Can these tokens be decoded to show which scopes they include? Tokens generated by the web client (344 characters) are noticeably longer than those generated by librespot-python (291 characters) so it stands to reason that the former include something which is missing from the latter.

kokarare1212 commented 2 years ago

ApiClient#get_metadata_4_episode allows the use of external_url. Other API endpoints have not been tested and are probably not implemented in upstream repositories.

gid: "\260\231$\210\237}Z\274\244\344\334\250$%e\254"
name: "#BOOMERCRINGE 20"
duration: 1876349
audio {
  file_id: "\016\364\033q\273WS\225-+\276\270|&\223m\376o1\323"
  format: AAC_24
}
audio {
  file_id: "\246I\313\266O\177\346\351r3\252r\307_%\026\"jh!"
}
audio {
  file_id: "d\254\341\3424\236\254\250\331\202\177z\301w\376?$\000\322\207"
}
audio {
  file_id: "\210p\254~F\033\327Z\223(\3577\320[\376\010\200\037\202\205"
  format: OGG_VORBIS_96
}
description: "Warum nicht die Zeit nutzen und die F\303\274\303\237e machen, w\303\244hrend ungebremst Metalwissen aus Olli herausballert? Und apropos kosmetische Anwendungen: das 
Boomercringe-Review empfiehlt der deutschen Pr\303\244sentation des ESC im R\303\274ckblick ein Ganzk\303\266rperpeeling plus Neumodellage.\302\240 Learn more about your ad choices
. Visit podcastchoices.com/adchoices"
publish_time {
  year: 2022
  month: 5
  day: 17
  hour: 21
  minute: 50
}
cover_image {
  image {
    file_id: "\253gec\000\000\366\215!\342w\357_\302\377\244I%M7"
    size: SMALL
    width: 64
    height: 64
  }
  image {
    file_id: "\253gec\000\000_\037!\342w\357_\302\377\244I%M7"
    size: DEFAULT
    width: 300
    height: 300
  }
  image {
    file_id: "\253gec\000\000\272\212!\342w\357_\302\377\244I%M7"
    size: LARGE
    width: 640
    height: 640
  }
}
language: "de"
explicit: false
show {
  gid: ";\302\000\344eXAR\235\212z=-\002~#"
  name: "Fest & Flauschig"
}
audio_preview {
  file_id: "\214\3515\317\36096\017\214\216k\255u\222d\030\226\023\226C"
  format: MP3_96
}
restriction {
  countries_forbidden: ""
  catalogue_str: "all"
  catalogue_str: "free"
  catalogue_str: "premium"
  catalogue_str: "shuffle"
  catalogue_str: "commercial"
}
allow_background_playback: false
external_url: "https://traffic.megaphone.fm/GLT2438388645.mp3?updated=1652802576"
type: FULL
Yetangitu commented 2 years ago

I'll have a look at that endpoint.

By the way, the web client does not seem to use the client-token header:

curl 'https://api.spotify.com/v1/episodes?ids=4ADWWq8hYGpXEhKyxtNOdL&market=from_token' \
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0'\
-H 'Accept: */*'\
-H 'Accept-Language: en-GB,en-US;q=0.7,en;q=0.3'\
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Referer: https://open.spotify.com/' \
-H 'Origin: https://open.spotify.com' \
-H 'DNT: 1' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Site: same-site' \
-H 'authorization: Bearer A_344_CHARACTER_TOKEN_HERE...' \
-H 'Connection: keep-alive'

Still the data is returned when the web client-generated token is used while it is not when using a librespot-python generated one.

kokarare1212 commented 2 years ago

Sorry, this library is targeted at native apps. It is not intended to guarantee the operation of API endpoints for web clients. The API endpoints that are currently working are not web endpoints, but endpoints inside TCP sockets. So use ApiClient#get_metadata_4_episode to retrieve metadata.

episode_id = EpisodeId.from_uri("spotify:episode:5neDqDVLOPbISIt2IYXlc8")
session.api().get_metadata_4_episode(episode_id)
Yetangitu commented 2 years ago

OK. I'll refactor spodcast to use these endpoints

Yetangitu commented 2 years ago

The original problem - not being able to download episodes - is now solved, I assume partly due to the inclusion of the client-token, partly due to me refactoring spodcast to use librespot-python-provided API interfaces instead of raw web API access. Some problems remain to be fixed - e.g. there does not seem to be a way to get the total decrypted size (or 'disk file size', this in contrast to 'stream size') for a given _fileid which makes it impossible to check whether a previously downloaded episode should be downloaded again. This is not related to this bug report so I'll close this one and open a request for information on that subject.

MikeeI commented 2 years ago
gid: "\n\244H\200I\256T\275\232\330\214\377\371\014\274N"
name: "#1839 - Duncan Trussell"
duration: 16257278
audio {
  file_id: "\326#\010\202\272?\"\274Z\314\335yru\276\307}H2\317"
  format: AAC_24
}
audio {
  file_id: "\376\322\275;\300\274\363\010\336K\357:\000\217s\203\017\031\334#"
}
audio {
  file_id: "&\234s\316\241\325\235\245\216\2664[?\362\347g\361\324\253\237"
}
audio {
  file_id: "\220\344\252\363*\201\n\\lR\356\215\306\341\031\270^8\021O"
  format: OGG_VORBIS_96
}
description: "Duncan Trussell is a stand-up comic, writer, actor, and host of the \"Duncan Trussell Family Hour\" podcast. http://www.duncantrussell.com/"
publish_time {
  year: 2022
  month: 7
  day: 5
  hour: 17
  minute: 0
}
cover_image {
  image {
    file_id: "\253gec\000\000\366\215\366\2349lt\225=\273Q\302\261B"
    size: SMALL
    width: 64
    height: 64
  }
  image {
    file_id: "\253gec\000\000\272\212\366\2349lt\225=\273Q\302\261B"
    size: DEFAULT
    width: 640
    height: 640
  }
  image {
    file_id: "\253gec\000\000_\037\366\2349lt\225=\273Q\302\261B"
    size: XLARGE
    width: 300
    height: 300
  }
  image {
    file_id: "\253gec\000\000\272\212\366\2349lt\225=\273Q\302\261B"
    size: XLARGE
    width: 640
    height: 640
  }
}
language: "en-US"
explicit: true
show {
  gid: "\222*\266\023\320\367FR\233\364\370>\360\033Z\254"
  name: "The Joe Rogan Experience"
}
video {
  file_id: "\200I\305.\235QU\272\262\n\255}\241+*\237"
}
audio_preview {
  file_id: "\300f\020\227\01045C\2075\006\365\335l\260\376%?\253/"
  format: MP3_96
}
restriction {
  countries_forbidden: ""
  catalogue_str: "all"
  catalogue_str: "free"
  catalogue_str: "premium"
  catalogue_str: "shuffle"
  catalogue_str: "commercial"
}
allow_background_playback: true
type: FULL

external_url is missing for joe rogan podcast https://open.spotify.com/episode/0k503tOyovs71ddjCCfmHA

kokarare1212 commented 2 years ago

If external_url is not present, audio is used. https://github.com/kokarare1212/librespot-python/blob/8ac9e6cf0d300fd30141daeeef4ef82a765cc118/librespot/audio/__init__.py#L751-L759