p4ul17 / voe-dl

A Python downloader for voe.sx videos
GNU General Public License v3.0
53 stars 7 forks source link

script not working anymore #18

Closed maveric213 closed 1 month ago

maveric213 commented 2 months ago

voe reworked their way how to provide the video streams. To me it even looks like, that interacting is necessary. Their is a lot of wiered javascript magic. I couldn't get on top of it yet. This is observed coming from s.to

maveric213 commented 2 months ago

I figured it out in the meantime. If details are of interesst ... let me know. Unfortunatley my python is bad ... and I can only assist with a PS implementation ...

p4ul17 commented 2 months ago

Yeah Details would be nice. I figured they hid the hls stream inside a script Tag and reversed it, then encoded it with base 64 so it looks like let '588hvgsk8g57bg' = '2hdbf5873hfbfck6993' and some gibberish like that. Could be that i saw an old version of the site and they changed that but if my assumption turns out to be true that's relatively easy to fix. I don't have much time tough, next week I should have time to fix it.

dertuxmalwieder commented 2 months ago

It is a bit easier if you just parse the browser-side code, my yaydl does that; alas, they seem to have even more protecting mechanisms, because just fetching the video stream URL still gives an error 400. I suspect that the headers are checked as well now? @maveric213

maveric213 commented 2 months ago

what I did ...

#!/usr/bin/pwsh
    PARAM (
                [Parameter(Mandatory = $true, Position=0)]
                [string] $url,
                [Parameter(Mandatory = $true, Position=1)]
                [string] $outfile,
                [Parameter(Mandatory = $true, Position=2)]
                [string] $outpath
             )

# find /redirect on s.to
$url1="https://s.to"+(((invoke-webrequest -uri $url ).content.split(">")) | where { $_ -like "*redirect*" })[0].split("=")[4].split(" ")[0].replace('"','')
# get voe page thru redirect
$url2=(((Invoke-WebRequest -Uri $url1).content).split(";").split("{") -match "https").split("=")[1].trim().replace("'","")
# extract "let"
$data=((Invoke-WebRequest -Uri $url2).content.split([Environment]::NewLine) -match "let")[-1].split("=")[1].split(";")[0].trim().replace("'","")
# decode bas64 and reverse string
if ( ($data.length % 2) -eq 1 ) { $data=$data+"=" }
$data1=([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("$data")).split('"'))
$data2=$data1.toCharArray()
[array]::reverse($data2)
# extract and fix url from coded string 
$url_m3u8=((-join($data2)).split(":")[6..7] -join":").replace("\/","/").replace(",fallback","")

# use ffmpeg to download stream
ffmpeg -n -hide_banner -analyzeduration 2147483647 -probesize 2147483647 -protocol_whitelist file,http,https,tcp,tls,crypto -i "$url_m3u8" -c copy -bsf:a aac_adtstoasc -safe 0 -map 0:v:0 -map 0:a:0 -flags global_header -metadata:s:a:0 language=ger "$outpath/$outfile"

I found issues when "just" connecting to the download page on s.to. They seem to authorize the ip over their homepage. So I do a connect to s.to prior to any download activity. This is not voe related :-) ... I know.

Input URL would be any episode page like "https://s.to/serie/stream/an-archdemons-dilemma-how-to-love-your-elf-bride/staffel-1/episode-3" ...

dertuxmalwieder commented 2 months ago

They seem to authorize the ip over their homepage. So I do a connect to s.to prior to any download activity. (...) This is not voe related :-)

Ah, so there are no additional header checks if you try to download a VOE video that is not on s,to? I'll take notes. Thank you.

maveric213 commented 2 months ago

with the help of google ... here you have a side-by-side diff, based on the july 15th version of the script:

# coding=utf-8                                                  # coding=utf-8
import sys, os, glob                                            import sys, os, glob
import re                                                       import re
import requests                                                 import requests
import json                                                     import json
import wget                                                     import wget
from bs4 import BeautifulSoup                                   from bs4 import BeautifulSoup
from yt_dlp import YoutubeDL                                    from yt_dlp import YoutubeDL
import base64 # import base64                                   import base64 # import base64

def main():                                                     def main():
    args = sys.argv #saving the cli arguments into args             args = sys.argv #saving the cli arguments into args

    try:                                                            try:
        args[1]     #try if args has a value at index 1                 args[1]     #try if args has a value at index 1
    except IndexError:                                              except IndexError:
        print("Please use a parameter. Use -h for Help") #if            print("Please use a parameter. Use -h for Help") #if
        quit()                                                          quit()

    if args[1] == "-h":     #if the first user argument is "-       if args[1] == "-h":     #if the first user argument is "-
        help()                                                          help()
    elif args[1] == "-u":   #if the first user argument is "-       elif args[1] == "-u":   #if the first user argument is "-
        URL = args[2]                                                   URL = args[2]
        download(URL)                                                   download(URL)
    elif args[1] == "-l":   #if the first user argument is "-       elif args[1] == "-l":   #if the first user argument is "-
        doc = args[2]                                                   doc = args[2]
        list_dl(doc)                                                    list_dl(doc)
    else:                                                           else:
        URL = args[1]       #if the first user argument is th           URL = args[1]       #if the first user argument is th
        download(URL)                                                   download(URL)

def help():                                                     def help():
    print("Version v1.2.4")                                         print("Version v1.2.4")
    print("")                                                       print("")
    print("______________")                                         print("______________")
    print("Arguments:")                                             print("Arguments:")
    print("-h shows this help")                                     print("-h shows this help")
    print("-u <URL> downloads the <URL> you specify")               print("-u <URL> downloads the <URL> you specify")
    print("-l <doc> opens the <doc> you specify and downloads       print("-l <doc> opens the <doc> you specify and downloads
    print("<URL> just the URL as Argument works the same as w       print("<URL> just the URL as Argument works the same as w
    print("______________")                                         print("______________")
    print("")                                                       print("")
    print("Credits to @NikOverflow, @cuitrlal and @cybersnash       print("Credits to @NikOverflow, @cuitrlal and @cybersnash

def list_dl(doc):                                               def list_dl(doc):
    curLink = 0                                                     curLink = 0
    lines = open(doc).readlines()       #reads the lines of t       lines = open(doc).readlines()       #reads the lines of t
    for link in lines:                  #calls the download f       for link in lines:                  #calls the download f
        curLink +=1                                                     curLink +=1
        print("Download %s / "%curLink + str(len(lines)))               print("Download %s / "%curLink + str(len(lines)))
        link = link.replace("\n","")                                    link = link.replace("\n","")
        print("echo Link: %s"%link)                                     print("echo Link: %s"%link)
        download(link)                                                  download(link)

def download(URL):                                              def download(URL):
   URL = str(URL)                                                  URL = str(URL)
    headers = {                                                     headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x
        "Accept": "text/html,application/xhtml+xml,applicatio           "Accept": "text/html,application/xhtml+xml,applicatio
        "Accept-Language": "de,en-US;q=0.7,en;q=0.3",                   "Accept-Language": "de,en-US;q=0.7,en;q=0.3",
        "Upgrade-Insecure-Requests": "1",                               "Upgrade-Insecure-Requests": "1",
        "Sec-Fetch-Dest": "document",                                   "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",                                   "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",                                       "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",                                         "Sec-Fetch-User": "?1",
        "Priority": "u=1"                                               "Priority": "u=1"
    }                                                               }
    html_page = requests.get(URL, headers=headers)                  html_page = requests.get(URL, headers=headers)
                                                              >     print("Download URL:"+URL)

    soup = BeautifulSoup(html_page.content, 'html.parser')          soup = BeautifulSoup(html_page.content, 'html.parser')
    if html_page.text.startswith("<script>"):                       if html_page.text.startswith("<script>"):
        START = "window.location.href = '"                              START = "window.location.href = '"
        L = len(START)                                                  L = len(START)
        i0 = html_page.text.find(START)                                 i0 = html_page.text.find(START)
        i1 = html_page.text.find("'",i0+L)                              i1 = html_page.text.find("'",i0+L)
        url = html_page.text[i0+L:i1]                                   url = html_page.text[i0+L:i1]
        return download(url)                                            return download(url)

    name_find = soup.find('meta', attrs={"name":"og:title"})        name_find = soup.find('meta', attrs={"name":"og:title"})
    name = name_find["content"]                                     name = name_find["content"]
    name = name.replace(" ","_")                                    name = name.replace(" ","_")
    print("Name of file: " + name)                                  print("Name of file: " + name)

    sources_find = soup.find_all(string = re.compile("var sou | #    sources_find = soup.find_all(string = re.compile("var so
                                                              >     sources_find = soup.find_all(string = re.compile("let "))
    sources_find = str(sources_find)                                sources_find = str(sources_find)
    #slice_start = sources_find.index("const sources")              #slice_start = sources_find.index("const sources")
    slice_start = sources_find.index("var sources")           |     #slice_start = sources_find.index("var sources")
                                                              >     slice_start = sources_find.index("let ")
    source = sources_find[slice_start:] #cutting everything b       source = sources_find[slice_start:] #cutting everything b
    slice_end = source.index(";")                                   slice_end = source.index(";")
    source = source[:slice_end] #cutting everything after ';'       source = source[:slice_end] #cutting everything after ';'
                                                              >     slice_start = source.index("'")+1
                                                              >     source = source[slice_start:]
                                                              >     source = source.replace("\'","")
                                                              >     # decode base64
                                                              >     source = base64.b64decode(source)
                                                              >     # convert Byte list back to string
                                                              >     source = source.decode("utf-8")
                                                              >     # revere string
                                                              >     source = source[::-1]
                                                              >

    source = source.replace("var sources = ","")    #         |     #source = source.replace("var sources = ","")    #
    source = source.replace("\'","\"")                #Making       source = source.replace("\'","\"")                #Making
    source = source.replace("\\n","")                 #             source = source.replace("\\n","")                 #
   source = source.replace("\\","")                  #             source = source.replace("\\","")                  #

    strToReplace = ","                                        | #    strToReplace = ","
    replacementStr = ""                                       | #    replacementStr = ""
    source = replacementStr.join(source.rsplit(strToReplace,  | #    source = replacementStr.join(source.rsplit(strToReplace,

    source_json = json.loads(source) #parsing the JSON              source_json = json.loads(source) #parsing the JSON
    try:                                                            try:
        link = source_json["mp4"] #extracting the link to the           link = source_json["mp4"] #extracting the link to the
        link = base64.b64decode(link)                                   link = base64.b64decode(link)
        link = link.decode("utf-8")                                     link = link.decode("utf-8")
        wget.download(link, out=f"{name}_SS.mp4") #downloadin           wget.download(link, out=f"{name}_SS.mp4") #downloadin
    except KeyError:                                                except KeyError:
        try:                                                            try:
            link = source_json["hls"]                         | #            link = source_json["hls"]
            link = base64.b64decode(link)                     |             link = source_json["file"] #extracting the link t
            link = link.decode("utf-8")                       | #            link = base64.b64decode(link)
                                                              > #            link = link.decode("utf-8")

            name = name +'_SS.mp4'                                          name = name +'_SS.mp4'

            ydl_opts = {'outtmpl' : name,}                                  ydl_opts = {'outtmpl' : name,}
            with YoutubeDL(ydl_opts) as ydl:                                with YoutubeDL(ydl_opts) as ydl:
                try:                                                            try:
                    ydl.download(link)                                              ydl.download(link)
                except Exception as e:                                          except Exception as e:
                    pass                                                            pass
            delpartfiles()                                                  delpartfiles()

        except KeyError:                                                except KeyError:
            print("Could not find downloadable URL. Voe might               print("Could not find downloadable URL. Voe might
            quit()                                                          quit()

    print("\n")                                                     print("\n")

def delpartfiles():                                             def delpartfiles():
    path = os.getcwd()                                              path = os.getcwd()
    for file in glob.iglob(os.path.join(path, '*.part')):           for file in glob.iglob(os.path.join(path, '*.part')):
        os.remove(file)                                                 os.remove(file)

if __name__ == "__main__":                                      if __name__ == "__main__":
    main()                                                          main()
maveric213 commented 2 months ago

just found out ... that works for s.to ... but not aniworld.to ... ... aniworld still uses the "var sources/hls" schema ... so the whole script needs to be reworked to suit both cases ... ... will look at that tomorrow ... ...

maveric213 commented 2 months ago

this works for me ... .... but should be optimized by someone that does know python ... :-)

# coding=utf-8
import sys, os, glob
import re
import requests
import json
import wget
from bs4 import BeautifulSoup
from yt_dlp import YoutubeDL
import base64 # import base64

def main():
    args = sys.argv #saving the cli arguments into args

    try:
        args[1]     #try if args has a value at index 1
    except IndexError:
        print("Please use a parameter. Use -h for Help") #if not, tells the user to specify an argument
        quit()

    if args[1] == "-h":     #if the first user argument is "-h" call the help function
        help()
    elif args[1] == "-u":   #if the first user argument is "-u" call the download function
        URL = args[2]
        download(URL)
    elif args[1] == "-l":   #if the first user argument is "-l" call the list_dl (list download) function
        doc = args[2]
        list_dl(doc)
    else:
        URL = args[1]       #if the first user argument is the <URL> call the download function
        download(URL)

def help():
    print("Version v1.2.4")
    print("")
    print("______________")
    print("Arguments:")
    print("-h shows this help")
    print("-u <URL> downloads the <URL> you specify")
    print("-l <doc> opens the <doc> you specify and downloads every URL line after line")
    print("<URL> just the URL as Argument works the same as with -u Argument")
    print("______________")
    print("")
    print("Credits to @NikOverflow, @cuitrlal and @cybersnash on GitHub for contributing")

def list_dl(doc):
    curLink = 0
    lines = open(doc).readlines()       #reads the lines of the given document and store them in the list "lines"
    for link in lines:                  #calls the download function for every link in the document
        curLink +=1
        print("Download %s / "%curLink + str(len(lines)))
        link = link.replace("\n","")
        print("echo Link: %s"%link)
        download(link)

def download(URL):
    URL = str(URL)
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Language": "de,en-US;q=0.7,en;q=0.3",
        "Upgrade-Insecure-Requests": "1",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",
        "Priority": "u=1"
    }
    html_page = requests.get(URL, headers=headers)
    print("Download URL:"+URL)

    soup = BeautifulSoup(html_page.content, 'html.parser')
    if html_page.text.startswith("<script>"):
        START = "window.location.href = '"
        L = len(START)
        i0 = html_page.text.find(START)
        i1 = html_page.text.find("'",i0+L)
        url = html_page.text[i0+L:i1]
        return download(url)

    name_find = soup.find('meta', attrs={"name":"og:title"})
    name = name_find["content"]
    name = name.replace(" ","_")
    print("Name of file: " + name)

    dl_type = "unknown"
    sources_find = soup.find_all(string = re.compile("var sources")) #searching for the script tag containing the link to the mp4
    if len(str(sources_find)) > 2:
        dl_type = "hls"
    else:
        sources_find = soup.find_all(string = re.compile("let ")) #searching for the script tag containing the link to the mp4
        if len (str(sources_find)) > 2:
            dl_type = "file"
    if dl_type == "unknown":
        print("Download page type unknwon, please inform developer")
        print("URL:"+ URL)
        sys.exit(-1)

    sources_find = str(sources_find)       
    if dl_type == "file":
        slice_start = sources_find.index("let ")
        source = sources_find[slice_start:] #cutting everything before 'var sources' in the script tag
        slice_end = source.index(";")
        source = source[:slice_end] #cutting everything after ';' in the remaining String to make it ready for the JSON parser
        slice_start = source.index("'")+1
        source = source[slice_start:]
        source = source.replace("\'","")
        # length of base64 string has to be even .. if not padding with a "="
        if (len(source) % 2) > 0:
            source = source + "="
        # decode base64
        source = base64.b64decode(source)
        print(source)
        # convert Byte list back to string
        source = source.decode("utf-8")
        # revere string
        source = source[::-1]

        source = source.replace("\'","\"")                #Making the JSON valid
        source = source.replace("\\n","")                 #
        source = source.replace("\\","")                  #

        source_json = json.loads(source) #parsing the JSON
        try:
            link = source_json["mp4"] #extracting the link to the mp4 file
            link = base64.b64decode(link)
            link = link.decode("utf-8")
            wget.download(link, out=f"{name}_SS.mp4") #downloading the file
        except KeyError:
            try:
                link = source_json["file"] #extracting the link to the mp4 file

                name = name +'_SS.mp4'

                ydl_opts = {'outtmpl' : name,}
                with YoutubeDL(ydl_opts) as ydl:
                    try:
                        ydl.download(link)
                    except Exception as e:
                        pass
                delpartfiles()

            except KeyError:
                print("Could not find downloadable URL. Voe might have change their site. Check that you are running the latest version of voe-dl, and if so file an issue on GitHub.")
                quit()

    if dl_type == "hls":
        slice_start = sources_find.index("var sources")
        source = sources_find[slice_start:] #cutting everything before 'var sources' in the script tag
        slice_end = source.index(";")
        source = source[:slice_end] #cutting everything after ';' in the remaining String to make it ready for the JSON parser

        source = source.replace("var sources = ","")    #
        source = source.replace("\'","\"")                #Making the JSON valid
        source = source.replace("\\n","")                 #
        source = source.replace("\\","")                  #

        strToReplace = ","
        replacementStr = ""
        source = replacementStr.join(source.rsplit(strToReplace, 1)) #complicated but needed replacement of the last comma in the source String to make it JSON valid

        source_json = json.loads(source) #parsing the JSON
        try:
            link = source_json["mp4"] #extracting the link to the mp4 file
            link = base64.b64decode(link)
            link = link.decode("utf-8")
            wget.download(link, out=f"{name}_SS.mp4") #downloading the file
        except KeyError:
            try:
                link = source_json["hls"]
                link = base64.b64decode(link)
                link = link.decode("utf-8")

                name = name +'_SS.mp4'

                ydl_opts = {'outtmpl' : name,}
                with YoutubeDL(ydl_opts) as ydl:
                    try:
                        ydl.download(link)
                    except Exception as e:
                        pass
                delpartfiles()

            except KeyError:
                print("Could not find downloadable URL. Voe might have change their site. Check that you are running the latest version of voe-dl, and if so file an issue on GitHub.")
                quit()
    print("\n")

def delpartfiles():
    path = os.getcwd()
    for file in glob.iglob(os.path.join(path, '*.part')):
        os.remove(file)

if __name__ == "__main__":
    main()
p4ul17 commented 1 month ago

Yeah works for me too, thanks! I don't think the script looks that bad either, looks nice to me.

If someone wants to, feel free to improve it...

maveric213 commented 1 month ago

Resolved, can be closed