Closed maveric213 closed 1 month ago
I figured it out in the meantime. If details are of interesst ... let me know. Unfortunatley my python is bad ... and I can only assist with a PS implementation ...
Yeah Details would be nice. I figured they hid the hls stream inside a script Tag and reversed it, then encoded it with base 64 so it looks like let '588hvgsk8g57bg' = '2hdbf5873hfbfck6993' and some gibberish like that. Could be that i saw an old version of the site and they changed that but if my assumption turns out to be true that's relatively easy to fix. I don't have much time tough, next week I should have time to fix it.
It is a bit easier if you just parse the browser-side code, my yaydl
does that; alas, they seem to have even more protecting mechanisms, because just fetching the video stream URL still gives an error 400. I suspect that the headers are checked as well now? @maveric213
what I did ...
#!/usr/bin/pwsh
PARAM (
[Parameter(Mandatory = $true, Position=0)]
[string] $url,
[Parameter(Mandatory = $true, Position=1)]
[string] $outfile,
[Parameter(Mandatory = $true, Position=2)]
[string] $outpath
)
# find /redirect on s.to
$url1="https://s.to"+(((invoke-webrequest -uri $url ).content.split(">")) | where { $_ -like "*redirect*" })[0].split("=")[4].split(" ")[0].replace('"','')
# get voe page thru redirect
$url2=(((Invoke-WebRequest -Uri $url1).content).split(";").split("{") -match "https").split("=")[1].trim().replace("'","")
# extract "let"
$data=((Invoke-WebRequest -Uri $url2).content.split([Environment]::NewLine) -match "let")[-1].split("=")[1].split(";")[0].trim().replace("'","")
# decode bas64 and reverse string
if ( ($data.length % 2) -eq 1 ) { $data=$data+"=" }
$data1=([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String("$data")).split('"'))
$data2=$data1.toCharArray()
[array]::reverse($data2)
# extract and fix url from coded string
$url_m3u8=((-join($data2)).split(":")[6..7] -join":").replace("\/","/").replace(",fallback","")
# use ffmpeg to download stream
ffmpeg -n -hide_banner -analyzeduration 2147483647 -probesize 2147483647 -protocol_whitelist file,http,https,tcp,tls,crypto -i "$url_m3u8" -c copy -bsf:a aac_adtstoasc -safe 0 -map 0:v:0 -map 0:a:0 -flags global_header -metadata:s:a:0 language=ger "$outpath/$outfile"
I found issues when "just" connecting to the download page on s.to. They seem to authorize the ip over their homepage. So I do a connect to s.to prior to any download activity. This is not voe related :-) ... I know.
Input URL would be any episode page like "https://s.to/serie/stream/an-archdemons-dilemma-how-to-love-your-elf-bride/staffel-1/episode-3" ...
They seem to authorize the ip over their homepage. So I do a connect to s.to prior to any download activity. (...) This is not voe related :-)
Ah, so there are no additional header checks if you try to download a VOE video that is not on s,to? I'll take notes. Thank you.
with the help of google ... here you have a side-by-side diff, based on the july 15th version of the script:
# coding=utf-8 # coding=utf-8
import sys, os, glob import sys, os, glob
import re import re
import requests import requests
import json import json
import wget import wget
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from yt_dlp import YoutubeDL from yt_dlp import YoutubeDL
import base64 # import base64 import base64 # import base64
def main(): def main():
args = sys.argv #saving the cli arguments into args args = sys.argv #saving the cli arguments into args
try: try:
args[1] #try if args has a value at index 1 args[1] #try if args has a value at index 1
except IndexError: except IndexError:
print("Please use a parameter. Use -h for Help") #if print("Please use a parameter. Use -h for Help") #if
quit() quit()
if args[1] == "-h": #if the first user argument is "- if args[1] == "-h": #if the first user argument is "-
help() help()
elif args[1] == "-u": #if the first user argument is "- elif args[1] == "-u": #if the first user argument is "-
URL = args[2] URL = args[2]
download(URL) download(URL)
elif args[1] == "-l": #if the first user argument is "- elif args[1] == "-l": #if the first user argument is "-
doc = args[2] doc = args[2]
list_dl(doc) list_dl(doc)
else: else:
URL = args[1] #if the first user argument is th URL = args[1] #if the first user argument is th
download(URL) download(URL)
def help(): def help():
print("Version v1.2.4") print("Version v1.2.4")
print("") print("")
print("______________") print("______________")
print("Arguments:") print("Arguments:")
print("-h shows this help") print("-h shows this help")
print("-u <URL> downloads the <URL> you specify") print("-u <URL> downloads the <URL> you specify")
print("-l <doc> opens the <doc> you specify and downloads print("-l <doc> opens the <doc> you specify and downloads
print("<URL> just the URL as Argument works the same as w print("<URL> just the URL as Argument works the same as w
print("______________") print("______________")
print("") print("")
print("Credits to @NikOverflow, @cuitrlal and @cybersnash print("Credits to @NikOverflow, @cuitrlal and @cybersnash
def list_dl(doc): def list_dl(doc):
curLink = 0 curLink = 0
lines = open(doc).readlines() #reads the lines of t lines = open(doc).readlines() #reads the lines of t
for link in lines: #calls the download f for link in lines: #calls the download f
curLink +=1 curLink +=1
print("Download %s / "%curLink + str(len(lines))) print("Download %s / "%curLink + str(len(lines)))
link = link.replace("\n","") link = link.replace("\n","")
print("echo Link: %s"%link) print("echo Link: %s"%link)
download(link) download(link)
def download(URL): def download(URL):
URL = str(URL) URL = str(URL)
headers = { headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x
"Accept": "text/html,application/xhtml+xml,applicatio "Accept": "text/html,application/xhtml+xml,applicatio
"Accept-Language": "de,en-US;q=0.7,en;q=0.3", "Accept-Language": "de,en-US;q=0.7,en;q=0.3",
"Upgrade-Insecure-Requests": "1", "Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document", "Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate", "Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none", "Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1", "Sec-Fetch-User": "?1",
"Priority": "u=1" "Priority": "u=1"
} }
html_page = requests.get(URL, headers=headers) html_page = requests.get(URL, headers=headers)
> print("Download URL:"+URL)
soup = BeautifulSoup(html_page.content, 'html.parser') soup = BeautifulSoup(html_page.content, 'html.parser')
if html_page.text.startswith("<script>"): if html_page.text.startswith("<script>"):
START = "window.location.href = '" START = "window.location.href = '"
L = len(START) L = len(START)
i0 = html_page.text.find(START) i0 = html_page.text.find(START)
i1 = html_page.text.find("'",i0+L) i1 = html_page.text.find("'",i0+L)
url = html_page.text[i0+L:i1] url = html_page.text[i0+L:i1]
return download(url) return download(url)
name_find = soup.find('meta', attrs={"name":"og:title"}) name_find = soup.find('meta', attrs={"name":"og:title"})
name = name_find["content"] name = name_find["content"]
name = name.replace(" ","_") name = name.replace(" ","_")
print("Name of file: " + name) print("Name of file: " + name)
sources_find = soup.find_all(string = re.compile("var sou | # sources_find = soup.find_all(string = re.compile("var so
> sources_find = soup.find_all(string = re.compile("let "))
sources_find = str(sources_find) sources_find = str(sources_find)
#slice_start = sources_find.index("const sources") #slice_start = sources_find.index("const sources")
slice_start = sources_find.index("var sources") | #slice_start = sources_find.index("var sources")
> slice_start = sources_find.index("let ")
source = sources_find[slice_start:] #cutting everything b source = sources_find[slice_start:] #cutting everything b
slice_end = source.index(";") slice_end = source.index(";")
source = source[:slice_end] #cutting everything after ';' source = source[:slice_end] #cutting everything after ';'
> slice_start = source.index("'")+1
> source = source[slice_start:]
> source = source.replace("\'","")
> # decode base64
> source = base64.b64decode(source)
> # convert Byte list back to string
> source = source.decode("utf-8")
> # revere string
> source = source[::-1]
>
source = source.replace("var sources = ","") # | #source = source.replace("var sources = ","") #
source = source.replace("\'","\"") #Making source = source.replace("\'","\"") #Making
source = source.replace("\\n","") # source = source.replace("\\n","") #
source = source.replace("\\","") # source = source.replace("\\","") #
strToReplace = "," | # strToReplace = ","
replacementStr = "" | # replacementStr = ""
source = replacementStr.join(source.rsplit(strToReplace, | # source = replacementStr.join(source.rsplit(strToReplace,
source_json = json.loads(source) #parsing the JSON source_json = json.loads(source) #parsing the JSON
try: try:
link = source_json["mp4"] #extracting the link to the link = source_json["mp4"] #extracting the link to the
link = base64.b64decode(link) link = base64.b64decode(link)
link = link.decode("utf-8") link = link.decode("utf-8")
wget.download(link, out=f"{name}_SS.mp4") #downloadin wget.download(link, out=f"{name}_SS.mp4") #downloadin
except KeyError: except KeyError:
try: try:
link = source_json["hls"] | # link = source_json["hls"]
link = base64.b64decode(link) | link = source_json["file"] #extracting the link t
link = link.decode("utf-8") | # link = base64.b64decode(link)
> # link = link.decode("utf-8")
name = name +'_SS.mp4' name = name +'_SS.mp4'
ydl_opts = {'outtmpl' : name,} ydl_opts = {'outtmpl' : name,}
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:
try: try:
ydl.download(link) ydl.download(link)
except Exception as e: except Exception as e:
pass pass
delpartfiles() delpartfiles()
except KeyError: except KeyError:
print("Could not find downloadable URL. Voe might print("Could not find downloadable URL. Voe might
quit() quit()
print("\n") print("\n")
def delpartfiles(): def delpartfiles():
path = os.getcwd() path = os.getcwd()
for file in glob.iglob(os.path.join(path, '*.part')): for file in glob.iglob(os.path.join(path, '*.part')):
os.remove(file) os.remove(file)
if __name__ == "__main__": if __name__ == "__main__":
main() main()
just found out ... that works for s.to ... but not aniworld.to ... ... aniworld still uses the "var sources/hls" schema ... so the whole script needs to be reworked to suit both cases ... ... will look at that tomorrow ... ...
this works for me ... .... but should be optimized by someone that does know python ... :-)
# coding=utf-8
import sys, os, glob
import re
import requests
import json
import wget
from bs4 import BeautifulSoup
from yt_dlp import YoutubeDL
import base64 # import base64
def main():
args = sys.argv #saving the cli arguments into args
try:
args[1] #try if args has a value at index 1
except IndexError:
print("Please use a parameter. Use -h for Help") #if not, tells the user to specify an argument
quit()
if args[1] == "-h": #if the first user argument is "-h" call the help function
help()
elif args[1] == "-u": #if the first user argument is "-u" call the download function
URL = args[2]
download(URL)
elif args[1] == "-l": #if the first user argument is "-l" call the list_dl (list download) function
doc = args[2]
list_dl(doc)
else:
URL = args[1] #if the first user argument is the <URL> call the download function
download(URL)
def help():
print("Version v1.2.4")
print("")
print("______________")
print("Arguments:")
print("-h shows this help")
print("-u <URL> downloads the <URL> you specify")
print("-l <doc> opens the <doc> you specify and downloads every URL line after line")
print("<URL> just the URL as Argument works the same as with -u Argument")
print("______________")
print("")
print("Credits to @NikOverflow, @cuitrlal and @cybersnash on GitHub for contributing")
def list_dl(doc):
curLink = 0
lines = open(doc).readlines() #reads the lines of the given document and store them in the list "lines"
for link in lines: #calls the download function for every link in the document
curLink +=1
print("Download %s / "%curLink + str(len(lines)))
link = link.replace("\n","")
print("echo Link: %s"%link)
download(link)
def download(URL):
URL = str(URL)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "de,en-US;q=0.7,en;q=0.3",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Priority": "u=1"
}
html_page = requests.get(URL, headers=headers)
print("Download URL:"+URL)
soup = BeautifulSoup(html_page.content, 'html.parser')
if html_page.text.startswith("<script>"):
START = "window.location.href = '"
L = len(START)
i0 = html_page.text.find(START)
i1 = html_page.text.find("'",i0+L)
url = html_page.text[i0+L:i1]
return download(url)
name_find = soup.find('meta', attrs={"name":"og:title"})
name = name_find["content"]
name = name.replace(" ","_")
print("Name of file: " + name)
dl_type = "unknown"
sources_find = soup.find_all(string = re.compile("var sources")) #searching for the script tag containing the link to the mp4
if len(str(sources_find)) > 2:
dl_type = "hls"
else:
sources_find = soup.find_all(string = re.compile("let ")) #searching for the script tag containing the link to the mp4
if len (str(sources_find)) > 2:
dl_type = "file"
if dl_type == "unknown":
print("Download page type unknwon, please inform developer")
print("URL:"+ URL)
sys.exit(-1)
sources_find = str(sources_find)
if dl_type == "file":
slice_start = sources_find.index("let ")
source = sources_find[slice_start:] #cutting everything before 'var sources' in the script tag
slice_end = source.index(";")
source = source[:slice_end] #cutting everything after ';' in the remaining String to make it ready for the JSON parser
slice_start = source.index("'")+1
source = source[slice_start:]
source = source.replace("\'","")
# length of base64 string has to be even .. if not padding with a "="
if (len(source) % 2) > 0:
source = source + "="
# decode base64
source = base64.b64decode(source)
print(source)
# convert Byte list back to string
source = source.decode("utf-8")
# revere string
source = source[::-1]
source = source.replace("\'","\"") #Making the JSON valid
source = source.replace("\\n","") #
source = source.replace("\\","") #
source_json = json.loads(source) #parsing the JSON
try:
link = source_json["mp4"] #extracting the link to the mp4 file
link = base64.b64decode(link)
link = link.decode("utf-8")
wget.download(link, out=f"{name}_SS.mp4") #downloading the file
except KeyError:
try:
link = source_json["file"] #extracting the link to the mp4 file
name = name +'_SS.mp4'
ydl_opts = {'outtmpl' : name,}
with YoutubeDL(ydl_opts) as ydl:
try:
ydl.download(link)
except Exception as e:
pass
delpartfiles()
except KeyError:
print("Could not find downloadable URL. Voe might have change their site. Check that you are running the latest version of voe-dl, and if so file an issue on GitHub.")
quit()
if dl_type == "hls":
slice_start = sources_find.index("var sources")
source = sources_find[slice_start:] #cutting everything before 'var sources' in the script tag
slice_end = source.index(";")
source = source[:slice_end] #cutting everything after ';' in the remaining String to make it ready for the JSON parser
source = source.replace("var sources = ","") #
source = source.replace("\'","\"") #Making the JSON valid
source = source.replace("\\n","") #
source = source.replace("\\","") #
strToReplace = ","
replacementStr = ""
source = replacementStr.join(source.rsplit(strToReplace, 1)) #complicated but needed replacement of the last comma in the source String to make it JSON valid
source_json = json.loads(source) #parsing the JSON
try:
link = source_json["mp4"] #extracting the link to the mp4 file
link = base64.b64decode(link)
link = link.decode("utf-8")
wget.download(link, out=f"{name}_SS.mp4") #downloading the file
except KeyError:
try:
link = source_json["hls"]
link = base64.b64decode(link)
link = link.decode("utf-8")
name = name +'_SS.mp4'
ydl_opts = {'outtmpl' : name,}
with YoutubeDL(ydl_opts) as ydl:
try:
ydl.download(link)
except Exception as e:
pass
delpartfiles()
except KeyError:
print("Could not find downloadable URL. Voe might have change their site. Check that you are running the latest version of voe-dl, and if so file an issue on GitHub.")
quit()
print("\n")
def delpartfiles():
path = os.getcwd()
for file in glob.iglob(os.path.join(path, '*.part')):
os.remove(file)
if __name__ == "__main__":
main()
Yeah works for me too, thanks! I don't think the script looks that bad either, looks nice to me.
If someone wants to, feel free to improve it...
Resolved, can be closed
voe reworked their way how to provide the video streams. To me it even looks like, that interacting is necessary. Their is a lot of wiered javascript magic. I couldn't get on top of it yet. This is observed coming from s.to