Implement Python Requests cookies.txt support to solve protected sites

Arcitec commented 4 years ago

@spaam Johan, I häv en ideja.

As you know, Dplay support is broken because they added recaptcha to their login page. But it is possible to reuse cookies from a browser to be logged into premium. It just needs a tiny change in svtplay-dl:

Add cookies.txt support to svtplay-dl (it is already supported by the Python Requests library which I assume you use (edit: yeah, you use it)): https://stackoverflow.com/questions/14742899/using-cookies-txt-file-with-python-requests and https://stackoverflow.com/questions/8405096/python-3-2-cookielib (the flag could be --cookie-login <file>).
Tell users to log in to Dplay in their browser (takes care of captcha) and then to use an addon like https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg to export the dplay cookies to a Netscape Cookies.txt file.

That way svtplay-dl can login as premium by reusing those cookies.

Arcitec commented 4 years ago

Okay @spaam, after an hour of work I have a working prototype.

Changes:

Uses os.path.join() to combine the --output dir with "cookies.txt", to load cookies from <outputdir>\cookies.txt. This needs cleaning up, by adding a new parameter such as --cookie-login <file> as mentioned earlier.
Still checks for username and password parameters. Just pass dummy ones, -u foo -p bar on the command line, to make it read the cookie file. When --cookie-login has been added instead, this can be cleaned up so that -u -p are not required anymore.
I made _token and _login into modified versions of their former selves. Token now does nothing. And login loads the cookies.txt from disk, and writes them into the Requests session cookie jar (via .update()). It then checks if it can read the user's favorites-list without authentication errors. (This code also has an alternative example, where I show how to use cookies=cj to use the MozillaCookieJar directly, but it's not as good as writing the cookies into Request's own cache, since you must always write cookies= if you use the alternative method).
I dislike how often _login is called if you do a "get all episodes" call. Would be nice if you rewrote that so that it only calls _login once, since that's a bit more stealthy/anti-detection. I also dislike that this leads to repeatedly loading the cookies.txt from disk which means that any "evolving" cookies from the website responses between each download will repeatedly get overwritten with the old ones. So this should really be fixed to 1. Load the cookie jar ONCE, and 2. Run _login ONCE, and 3. Look at the response status code from ALL other (regular "get episode info/get all episodes") API calls to check for errors to detect if you are no longer logged in. This is better than constantly calling login.

Anyway, an hour of pretty intense research, so have fun with this and if you wanna give me a shoutout as thanks for doing some significant work on this (was a ton of research since the exact method of joining the two cookie jars was not documented by anyone), then feel free to mention in a small credit in your commit message later, heh. Take care!

# ex:ts=4:sw=4:sts=4:et
# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
from __future__ import absolute_import

import hashlib
import logging
import os
import random
import re
from http.cookiejar import MozillaCookieJar
from urllib.parse import urlparse

from svtplay_dl.error import ServiceError
from svtplay_dl.fetcher.hls import hlsparse
from svtplay_dl.service import Service
from svtplay_dl.subtitle import subtitle

country = {"sv": ".se", "da": ".dk", "no": ".no"}

class Dplay(Service):
    supported_domains = ["dplay.se", "dplay.dk", "dplay.no"]

    def get(self):
        parse = urlparse(self.url)
        self.domain = re.search(r"(dplay\.\w\w)", parse.netloc).group(1)

        if not self._token():
            logging.error("Something went wrong getting token for requests")

        if self.config.get("username") and self.config.get("password"):
            premium = self._login()
            if not premium:
                logging.warning("Wrong username/password. no support for recaptcha.")

        channel = False
        if "kanaler" in parse.path:
            match = re.search("kanaler/([^/]+)$", parse.path)
            path = "/channels/{}".format(match.group(1))
            url = "https://disco-api.{}/content{}".format(self.domain, path)
            channel = True
            self.config.set("live", True)
        elif "program" in parse.path:
            match = re.search("(programmer|program)/([^/]+)$", parse.path)
            path = "/shows/{}".format(match.group(2))
            url = "https://disco-api.{}/content{}".format(self.domain, path)
            res = self.http.get(url, headers={"x-disco-client": "WEB:UNKNOWN:dplay-client:0.0.1"})
            programid = res.json()["data"]["id"]
            qyerystring = (
                "include=primaryChannel,show&filter[videoType]=EPISODE&filter[show.id]={}&"
                "page[size]=100&sort=seasonNumber,episodeNumber,-earliestPlayableStart".format(programid)
            )
            res = self.http.get("https://disco-api.{}/content/videos?{}".format(self.domain, qyerystring))
            janson = res.json()
            vid = 0
            slug = None
            for i in janson["data"]:
                if int(i["id"]) > vid:
                    vid = int(i["id"])
                    slug = i["attributes"]["path"]
            if slug:
                url = "https://disco-api.{}/content/videos/{}".format(self.domain, slug)
            else:
                yield ServiceError("Cant find latest video on program url")
                return
        else:
            match = re.search("(videos|videoer)/(.*)$", parse.path)
            url = "https://disco-api.{}/content/videos/{}".format(self.domain, match.group(2))
        res = self.http.get(url, headers={"x-disco-client": "WEB:UNKNOWN:dplay-client:0.0.1"})
        janson = res.json()
        if "errors" in janson:
            yield ServiceError("Cant find any videos on this url")
            return

        if channel:
            name = janson["data"]["attributes"]["name"]
            self.output["title"] = name
        else:
            name = self._autoname(janson)
        if name is None:
            yield ServiceError("Cant find vid id for autonaming")
            return
        self.output["id"] = janson["data"]["id"]

        api = "https://disco-api.{}/playback/videoPlaybackInfo/{}".format(self.domain, janson["data"]["id"])
        res = self.http.get(api)
        if res.status_code > 400:
            yield ServiceError("You dont have permission to watch this")
            return
        streams = hlsparse(
            self.config,
            self.http.request("get", res.json()["data"]["attributes"]["streaming"]["hls"]["url"]),
            res.json()["data"]["attributes"]["streaming"]["hls"]["url"],
            httpobject=self.http,
            output=self.output,
        )
        for n in list(streams.keys()):
            if isinstance(streams[n], subtitle):  # we get the subtitles from the hls playlist.
                if self.config.get("get_all_subtitles"):
                    yield streams[n]
                else:
                    if streams[n].subfix in country and country[streams[n].subfix] in self.domain:
                        yield streams[n]
            else:
                yield streams[n]

    def _autoname(self, jsondata):
        match = re.search("^([^/]+)/", jsondata["data"]["attributes"]["path"])
        self.output["title"] = match.group(1)
        self.output["season"] = int(jsondata["data"]["attributes"]["seasonNumber"])
        self.output["episode"] = int(jsondata["data"]["attributes"]["episodeNumber"])
        self.output["episodename"] = jsondata["data"]["attributes"]["name"]
        return self.output["title"]

    def find_all_episodes(self, config):
        parse = urlparse(self.url)
        self.domain = re.search(r"(dplay\.\w\w)", parse.netloc).group(1)

        match = re.search("^/(program|programmer|videos|videoer)/([^/]+)", parse.path)
        if not match:
            logging.error("Can't find show name")
            return None

        if not self._token():
            logging.error("Something went wrong getting token for requests")

        premium = False
        if self.config.get("username") and self.config.get("password"):
            premium = self._login()
            if not premium:
                logging.warning("Wrong username/password.")

        url = "https://disco-api.{}/content/shows/{}".format(self.domain, match.group(2))
        res = self.http.get(url)
        programid = res.json()["data"]["id"]
        seasons = res.json()["data"]["attributes"]["seasonNumbers"]
        episodes = []
        for season in seasons:
            qyerystring = (
                "include=primaryChannel,show&filter[videoType]=EPISODE&filter[show.id]={}&filter[seasonNumber]={}&"
                "page[size]=100&sort=seasonNumber,episodeNumber,-earliestPlayableStart".format(programid, season)
            )
            res = self.http.get("https://disco-api.{}/content/videos?{}".format(self.domain, qyerystring))
            janson = res.json()
            for i in janson["data"]:
                if not premium and "Free" not in i["attributes"]["packages"]:
                    continue
                episodes.append("https://www.{}/videos/{}".format(self.domain, i["attributes"]["path"]))
        if len(episodes) == 0:
            logging.error("Cant find any playable files")
        if config.get("all_last") > 0:
            return episodes[: config.get("all_last")]
        return episodes

    def _login(self):
        cookiesFile = os.path.join(self.config.get("output"), "cookies.txt")

        cj = MozillaCookieJar(cookiesFile)
        cj.load(ignore_discard=True, ignore_expires=True) # Loads session cookies too (expirydate=0).

        url = "https://disco-api.{}/users/me/favorites?include=default".format(self.domain)

        # Method 1: Always pass the custom cookie jar as a parameter on every request.
        #res = self.http.get(url, cookies=cj)

        # Method 2: Write the cookies to the Requests session jar (best method).
        self.http.cookies.update(cj)

        res = self.http.get(url)
        if res.status_code >= 400:
            return False
        return True
        # url = "https://disco-api.{}/login".format(self.domain)
        # login = {"credentials": {"username": self.config.get("username"), "password": self.config.get("password")}}
        # res = self.http.post(url, json=login)
        # if res.status_code >= 400:
        #     return False
        # return True

    def _token(self):
        return True
        # # random device id for cookietoken
        # deviceid = hashlib.sha256(bytes(int(random.random() * 1000))).hexdigest()
        # url = "https://disco-api.{}/token?realm={}&deviceId={}&shortlived=true".format(self.domain, self.domain.replace(".", ""), deviceid)
        # res = self.http.get(url)
        # if res.status_code >= 400:
        #     return False
        # return True

Example command line usage for this temporary patch:

svtplay-dl -o C:\Users\Foo\Desktop\dplaydownloads -u foo -p bar --subfolder --subtitle --all-subtitles --all-episodes https://www.dplay.se/program/alla-mot-alla-med-filip-och-fredrik

The cookies.txt must be in your output directory. To generate cookies.txt use this addon for Chrome: https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg (there are other cookies.txt exporters for most other browsers like Firefox, etc). Just login to dplay, then export the cookies from that site and save the cookies.txt in your svtplay-dl output directory.

Arcitec commented 4 years ago

If someone else is reading this and wants this temp-patch right now without waiting, then follow these steps:

Install Python3.
Install everything from requirements.txt (pip install requests PySocks cryptography pyyaml)
Clone this project: git clone https://github.com/spaam/svtplay-dl.git.
Replace lib\svtplay_dl\service\dplay.py with my quoted code above.
To run the patched svtplay-dl, either follow the steps in this project's official README to build the packaged exe file, or just navigate to lib\ and run the source code directly by typing python -m svtplay_dl. ;-)
To download from dplay via cookies, follow all steps at the end of my previous message, to see command line examples and how to make cookies.txt. (Note: If you're running it directly, replace the word svtplay-dl at the start of the command, with python -m svtplay_dl instead, such as python -m svtplay_dl -o C:\etc...).

Sopor commented 4 years ago

I only get this error when i try to download something TypeError: expected str, bytes or os.PathLike object, not NoneType

spaam / svtplay-dl

Implement Python Requests cookies.txt support to solve protected sites #1209