Open mo-han opened 4 years ago
Dammit I cannot wait anymore so just wrote a tiny python script to get uploader and rename downloaded mp4 files, which turned out the iwara web page can be simply parsed using lxml library and the uploader is so easy to be extractd. I refused to create a pull-request though because such piece of cake should be done without difficulty at all.
Here is my self-using script, anyway.
#!/usr/bin/env python3
# encoding=utf8
import sys
from urllib.parse import urlparse
from lxml import html
from requests import get
from glob import glob
from os.path import split, splitext, join
from os import rename
class IwaraVideo:
def __init__(self, url: str):
self.urlparse = urlparse(url)
if 'iwara' not in self.urlparse.hostname:
raise ValueError(url)
elif 'video' not in self.urlparse.path:
raise ValueError(url)
self.url = url
self.html = None
self.meta = {
'id': self.urlparse.path.split('/')[-1],
}
def get_page(self):
if not self.html:
r = get(self.url)
self.html = html.document_fromstring(r.text)
return self.html
def get_uploader(self):
video_page = self.get_page()
uploader = video_page.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
self.meta['uploader'] = uploader
return uploader
def find_files_by_id(self, search_in=''):
id_tag = '[{}]'.format(self.meta['id'])
self.meta['id_tag'] = id_tag
mp4_l = glob(search_in + '*.mp4')
r_l = []
for i in mp4_l:
if id_tag in i:
r_l.append(i)
return r_l
def rename_files_from_ytdl_na_to_uploader(self, search_in=''):
na_tag = '[NA]'
path_l = self.find_files_by_id(search_in=search_in)
id_tag = self.meta['id_tag']
uploader = self.get_uploader()
up_tag = '[{}]'.format(uploader)
for p in path_l:
dirname, basename = split(p)
filename, extension = splitext(basename)
if na_tag in filename:
left, right = filename.split(id_tag, maxsplit=1)
right = right.replace(na_tag, up_tag, 1)
new_basename = left + id_tag + right + extension
new_path = join(dirname, new_basename)
rename(p, new_path)
if __name__ == '__main__':
u = sys.argv[1]
video = IwaraVideo(u)
video.rename_files_from_ytdl_na_to_uploader()
monkey patch version:
class YoutubeDLIwaraX(youtube_dl.extractor.iwara.IwaraIE, metaclass=ABCMeta):
def _real_extract(self, url):
html = get_html_element_tree(url)
uploader = html.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
data = super(YoutubeDLIwaraX, self)._real_extract(url)
data['uploader'] = uploader
# print('#', 'uploader:', uploader)
return data
def youtube_dl_main_x_iwara(argv=None):
youtube_dl.extractor.IwaraIE = YoutubeDLIwaraX
youtube_dl.main(argv)
Hi mo-han, in case this issue is never fixed, could you explain to a non-python-programmer how to use your code?
@ZYinMD
Refer to my self-using module as an example, which is really simple, with the youtube_dl_main_x_iwara
as a modified main() function of the original youtube-dl. Just call this function and everything works the same as the original, except iwara extractor has uploader
data now.
Thanks! I'll try... By the way, since youtube-dl doesn't support downloading "channels" on iwara, how do you download all videos from one uploader? Do you write your own crawler? I know it's quite easy, but just wondering.
I don't have "channel" extractor (nor ytdl), which is not "quite easy" for me -- it needs to check "private flag" of the videos and do some "next page" actions on the "all videos" result page, etc. I didn't try to do that, and there definitely will be a lot of problems and work to achieve that.
As for your demand: batch downloading from iwara.tv or similar -- I do have a solution, not fully automated, but still saving a lot of copy-paste and mouse-click operations.
First we need to get the URLs of the selected videos. I don't use Chrome, but Firefox has a feature called "View Selection Source". When anything is selected (or selete everything by ctrl+a), there will be that feature in the right-click context menu, which will bring you to a new tab containing the source code of the page and the parts corresponding to the selection will be auto selected for you. So we can just use mouse to select multiple videos (their thumbnails on the web page) or just select all elements on the page, then choose View Selection Source
in the right-click-menu, then copy (ctrl+c) the selected source code into clipboard, and move on.
Secondly, we need to find all the video URLs in the clipboard. While a lot of tools and methods could be used to do this job, I do write my own tool, called mykit.py. It's a CLI program, with a lot of sub-commands, among which is a command called clipboard.findurl
or cb.url
or cburl
, same command, just several aliases. This cburl
sub-command will extract text strings from a file or the clipboard, by a given pattern. The pattren is a regex, but we don't need to write our own because iwara's video URL pattern is already one of the presets. So, a simple command mykit cburl iwara
will find all of the video URLs from the source code in the clipboard, and print them out line by line, meanwhile, the results are also copied back to clipboard.
Finally, just use those lines of URLs as argument to launch multiple download processes. We could save those URL lines into a file, using shell script to read them out and run youtube-dl (or the modifed version) with each line. Again, the mykit.py
could give a hand, with a sub-command called run.from.lines
or runlines
or rl
, which reads lines from a file or from the clipboard, and run a command format template with each line. What I do is typing a single command mykit.py rl ytdl {}
, and it will read those URL lines in clipboard and run a command as ytdl {url}
for each.
Not very automatic, but convinient enought, isn't it?
Or you could wirte you own "channel" extractor, if it's worth it.
Thanks so much!! I read all the code in those script files you mentioned, and they make perfect sense. As a python-noob and powershell-noob I still have questions about installing, I think I could open issues in your repo. Thanks and see you there!
@ZYinMD That's fine, me a half python noob and a total ps noob as well.
Checklist
Verbose log
Description
Videos are downloaded successfully, but
%(uploader)s
is always replaced byNA
(it's part of my wanted filename format).