spaam / svtplay-dl

Small command-line program to download videos from some streaming sites.
https://svtplay-dl.se
MIT License
724 stars 122 forks source link

Implement Python Requests cookies.txt support to solve protected sites #1209

Open Arcitec opened 4 years ago

Arcitec commented 4 years ago

@spaam Johan, I häv en ideja.

As you know, Dplay support is broken because they added recaptcha to their login page. But it is possible to reuse cookies from a browser to be logged into premium. It just needs a tiny change in svtplay-dl:

  1. Add cookies.txt support to svtplay-dl (it is already supported by the Python Requests library which I assume you use (edit: yeah, you use it)): https://stackoverflow.com/questions/14742899/using-cookies-txt-file-with-python-requests and https://stackoverflow.com/questions/8405096/python-3-2-cookielib (the flag could be --cookie-login <file>).

  2. Tell users to log in to Dplay in their browser (takes care of captcha) and then to use an addon like https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg to export the dplay cookies to a Netscape Cookies.txt file.

That way svtplay-dl can login as premium by reusing those cookies.

Arcitec commented 4 years ago

Okay @spaam, after an hour of work I have a working prototype.

Changes:

Anyway, an hour of pretty intense research, so have fun with this and if you wanna give me a shoutout as thanks for doing some significant work on this (was a ton of research since the exact method of joining the two cookie jars was not documented by anyone), then feel free to mention in a small credit in your commit message later, heh. Take care!

# ex:ts=4:sw=4:sts=4:et
# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
from __future__ import absolute_import

import hashlib
import logging
import os
import random
import re
from http.cookiejar import MozillaCookieJar
from urllib.parse import urlparse

from svtplay_dl.error import ServiceError
from svtplay_dl.fetcher.hls import hlsparse
from svtplay_dl.service import Service
from svtplay_dl.subtitle import subtitle

country = {"sv": ".se", "da": ".dk", "no": ".no"}

class Dplay(Service):
    supported_domains = ["dplay.se", "dplay.dk", "dplay.no"]

    def get(self):
        parse = urlparse(self.url)
        self.domain = re.search(r"(dplay\.\w\w)", parse.netloc).group(1)

        if not self._token():
            logging.error("Something went wrong getting token for requests")

        if self.config.get("username") and self.config.get("password"):
            premium = self._login()
            if not premium:
                logging.warning("Wrong username/password. no support for recaptcha.")

        channel = False
        if "kanaler" in parse.path:
            match = re.search("kanaler/([^/]+)$", parse.path)
            path = "/channels/{}".format(match.group(1))
            url = "https://disco-api.{}/content{}".format(self.domain, path)
            channel = True
            self.config.set("live", True)
        elif "program" in parse.path:
            match = re.search("(programmer|program)/([^/]+)$", parse.path)
            path = "/shows/{}".format(match.group(2))
            url = "https://disco-api.{}/content{}".format(self.domain, path)
            res = self.http.get(url, headers={"x-disco-client": "WEB:UNKNOWN:dplay-client:0.0.1"})
            programid = res.json()["data"]["id"]
            qyerystring = (
                "include=primaryChannel,show&filter[videoType]=EPISODE&filter[show.id]={}&"
                "page[size]=100&sort=seasonNumber,episodeNumber,-earliestPlayableStart".format(programid)
            )
            res = self.http.get("https://disco-api.{}/content/videos?{}".format(self.domain, qyerystring))
            janson = res.json()
            vid = 0
            slug = None
            for i in janson["data"]:
                if int(i["id"]) > vid:
                    vid = int(i["id"])
                    slug = i["attributes"]["path"]
            if slug:
                url = "https://disco-api.{}/content/videos/{}".format(self.domain, slug)
            else:
                yield ServiceError("Cant find latest video on program url")
                return
        else:
            match = re.search("(videos|videoer)/(.*)$", parse.path)
            url = "https://disco-api.{}/content/videos/{}".format(self.domain, match.group(2))
        res = self.http.get(url, headers={"x-disco-client": "WEB:UNKNOWN:dplay-client:0.0.1"})
        janson = res.json()
        if "errors" in janson:
            yield ServiceError("Cant find any videos on this url")
            return

        if channel:
            name = janson["data"]["attributes"]["name"]
            self.output["title"] = name
        else:
            name = self._autoname(janson)
        if name is None:
            yield ServiceError("Cant find vid id for autonaming")
            return
        self.output["id"] = janson["data"]["id"]

        api = "https://disco-api.{}/playback/videoPlaybackInfo/{}".format(self.domain, janson["data"]["id"])
        res = self.http.get(api)
        if res.status_code > 400:
            yield ServiceError("You dont have permission to watch this")
            return
        streams = hlsparse(
            self.config,
            self.http.request("get", res.json()["data"]["attributes"]["streaming"]["hls"]["url"]),
            res.json()["data"]["attributes"]["streaming"]["hls"]["url"],
            httpobject=self.http,
            output=self.output,
        )
        for n in list(streams.keys()):
            if isinstance(streams[n], subtitle):  # we get the subtitles from the hls playlist.
                if self.config.get("get_all_subtitles"):
                    yield streams[n]
                else:
                    if streams[n].subfix in country and country[streams[n].subfix] in self.domain:
                        yield streams[n]
            else:
                yield streams[n]

    def _autoname(self, jsondata):
        match = re.search("^([^/]+)/", jsondata["data"]["attributes"]["path"])
        self.output["title"] = match.group(1)
        self.output["season"] = int(jsondata["data"]["attributes"]["seasonNumber"])
        self.output["episode"] = int(jsondata["data"]["attributes"]["episodeNumber"])
        self.output["episodename"] = jsondata["data"]["attributes"]["name"]
        return self.output["title"]

    def find_all_episodes(self, config):
        parse = urlparse(self.url)
        self.domain = re.search(r"(dplay\.\w\w)", parse.netloc).group(1)

        match = re.search("^/(program|programmer|videos|videoer)/([^/]+)", parse.path)
        if not match:
            logging.error("Can't find show name")
            return None

        if not self._token():
            logging.error("Something went wrong getting token for requests")

        premium = False
        if self.config.get("username") and self.config.get("password"):
            premium = self._login()
            if not premium:
                logging.warning("Wrong username/password.")

        url = "https://disco-api.{}/content/shows/{}".format(self.domain, match.group(2))
        res = self.http.get(url)
        programid = res.json()["data"]["id"]
        seasons = res.json()["data"]["attributes"]["seasonNumbers"]
        episodes = []
        for season in seasons:
            qyerystring = (
                "include=primaryChannel,show&filter[videoType]=EPISODE&filter[show.id]={}&filter[seasonNumber]={}&"
                "page[size]=100&sort=seasonNumber,episodeNumber,-earliestPlayableStart".format(programid, season)
            )
            res = self.http.get("https://disco-api.{}/content/videos?{}".format(self.domain, qyerystring))
            janson = res.json()
            for i in janson["data"]:
                if not premium and "Free" not in i["attributes"]["packages"]:
                    continue
                episodes.append("https://www.{}/videos/{}".format(self.domain, i["attributes"]["path"]))
        if len(episodes) == 0:
            logging.error("Cant find any playable files")
        if config.get("all_last") > 0:
            return episodes[: config.get("all_last")]
        return episodes

    def _login(self):
        cookiesFile = os.path.join(self.config.get("output"), "cookies.txt")

        cj = MozillaCookieJar(cookiesFile)
        cj.load(ignore_discard=True, ignore_expires=True) # Loads session cookies too (expirydate=0).

        url = "https://disco-api.{}/users/me/favorites?include=default".format(self.domain)

        # Method 1: Always pass the custom cookie jar as a parameter on every request.
        #res = self.http.get(url, cookies=cj)

        # Method 2: Write the cookies to the Requests session jar (best method).
        self.http.cookies.update(cj)

        res = self.http.get(url)
        if res.status_code >= 400:
            return False
        return True
        # url = "https://disco-api.{}/login".format(self.domain)
        # login = {"credentials": {"username": self.config.get("username"), "password": self.config.get("password")}}
        # res = self.http.post(url, json=login)
        # if res.status_code >= 400:
        #     return False
        # return True

    def _token(self):
        return True
        # # random device id for cookietoken
        # deviceid = hashlib.sha256(bytes(int(random.random() * 1000))).hexdigest()
        # url = "https://disco-api.{}/token?realm={}&deviceId={}&shortlived=true".format(self.domain, self.domain.replace(".", ""), deviceid)
        # res = self.http.get(url)
        # if res.status_code >= 400:
        #     return False
        # return True

Example command line usage for this temporary patch:

svtplay-dl -o C:\Users\Foo\Desktop\dplaydownloads -u foo -p bar --subfolder --subtitle --all-subtitles --all-episodes https://www.dplay.se/program/alla-mot-alla-med-filip-och-fredrik

The cookies.txt must be in your output directory. To generate cookies.txt use this addon for Chrome: https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg (there are other cookies.txt exporters for most other browsers like Firefox, etc). Just login to dplay, then export the cookies from that site and save the cookies.txt in your svtplay-dl output directory.

Arcitec commented 4 years ago

If someone else is reading this and wants this temp-patch right now without waiting, then follow these steps:

  1. Install Python3.
  2. Install everything from requirements.txt (pip install requests PySocks cryptography pyyaml)
  3. Clone this project: git clone https://github.com/spaam/svtplay-dl.git.
  4. Replace lib\svtplay_dl\service\dplay.py with my quoted code above.
  5. To run the patched svtplay-dl, either follow the steps in this project's official README to build the packaged exe file, or just navigate to lib\ and run the source code directly by typing python -m svtplay_dl. ;-)
  6. To download from dplay via cookies, follow all steps at the end of my previous message, to see command line examples and how to make cookies.txt. (Note: If you're running it directly, replace the word svtplay-dl at the start of the command, with python -m svtplay_dl instead, such as python -m svtplay_dl -o C:\etc...).
Sopor commented 4 years ago

I only get this error when i try to download something TypeError: expected str, bytes or os.PathLike object, not NoneType