ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.4k stars 9.96k forks source link

[Cartoon network] Playlist support request #13578

Open keybounce opened 7 years ago

keybounce commented 7 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.07.02. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


This is a request for series playlist support.

Url's of the form http://www.cartoonnetwork.com/video/teen-titans-go/episodes/index.html http://www.cartoonnetwork.com/video/teen-titans-go/episodes/season-4.html http://www.cartoonnetwork.com/video/ben-10/episodes/season-1.html http://www.cartoonnetwork.com/video/ben-10/index.html (Yep, that one has 4 episodes not on the current season page, go figure, and the numbers indicate that they missed a lot :-) http://www.cartoonnetwork.com/video/nexo-knights/episodes/index.html http://www.cartoonnetwork.com/video/nexo-knights/episodes/season-3.html (NB: I didn't even know that there was a 3rd season)

Etc.

keybounceMBP:CartoonNetwork michael$ youtube-dl -v http://www.cartoonnetwork.com/video/teen-titans-go/episodes
/index.html
[debug] System config: []
[debug] User config: ['-k', '-o', '%(title)s.%(ext)s', '-f', '\nbest[ext=mp4][height>431][height<=576]/\nbestvideo[ext=mp4][height=480]+bestaudio[ext=m4a]/\nbest[ext=mp4][height>340][height<=431]/\nbestvideo[ext=mp4][height>360][height<=576]+bestaudio/\nbest[height>340][height<=576]/\nbestvideo[height>360][height<=576]+bestaudio/\nbestvideo[height=360]+bestaudio/\nbest[ext=mp4][height>=280][height<=360]/\nbest[height<=576]/\nworst', '--ap-mso', 'Dish', '--ap-username', 'PRIVATE', '--ap-password', 'PRIVATE', '--write-sub', '--write-auto-sub', '--sub-lang', 'en,enUS,en-us', '--sub-format', 'ass/srt/best', '--convert-subs', 'ass', '--embed-subs', '--mark-watched', '--download-archive', 'downloaded-videos.txt']
[debug] Custom config: []
[debug] Command-line args: ['-v', 'http://www.cartoonnetwork.com/video/teen-titans-go/episodes/index.html']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.07.02
[debug] Python version 3.6.1 - Darwin-13.4.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.2.4, ffprobe 3.2.4, rtmpdump 2.4
[debug] Proxy map: {}
[generic] index: Requesting header
WARNING: Falling back on generic information extractor.
[generic] index: Downloading webpage
[generic] index: Extracting information
ERROR: Unsupported URL: http://www.cartoonnetwork.com/video/teen-titans-go/episodes/index.html
Traceback (most recent call last):
  File "/Users/michael/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 762, in extract_info
    ie_result = ie.extract(url)
  File "/Users/michael/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/Users/michael/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2824, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://www.cartoonnetwork.com/video/teen-titans-go/episodes/index.html
siddht4 commented 7 years ago

@keybounce cartoonnetwork comes under https://github.com/rg3/youtube-dl/blob/master/README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free .They use dmca,so though it can supported it will be copyright infringement with Turner Sports and Entertainment Digital Network as mentioned here http://www.cartoonnetworkindia.com/trademark (cn india),your may be slightly different geographically.

keybounce commented 7 years ago

CartoonNetwork is the official site for the network. It is not a license breaking "for free" place.

I can download individual episodes no problem. This is a request for playlist support, so I can just have a script fetch the shows that I want, rather than having to copy/paste each episode URL from my browser into a file first.

siddht4 commented 7 years ago

If you can download the videos using youtube-dl then I can partially help you. I can provide you the embed python script,this is just an extension of the normal embed code in youtube-dl.Playlist request needs lots of work.

keybounce commented 7 years ago

Sadly, even after looking over that github, I cannot really figure out how to use it. Other than as an example of calling youtube-dl from another program.

siddht4 commented 7 years ago

ok instead save all the videos you need in text format,then run youtube-dl as "youtube-dl -a [text_file_name]"

siddht4 commented 7 years ago

Sadly, even after looking over that github, I cannot really figure out how to use it. Other than as an example of calling youtube-dl from another program.

okay you cant figure it out is fine.Its just mere a python program to embed youtube-dl as provided here https://github.com/rg3/youtube-dl#embedding-youtube-dl. Just expanding its capacity to meet my needs.

keybounce commented 7 years ago

ok instead save all the videos you need in text format,then run youtube-dl as "youtube-dl -a [text_file_name]"

Yes, but that is what I have to do, and why I'm asking for playlist support.

To help whoever does decide to do playlist support: The episode pages have usually one, sometimes two, giant json blocks. The key you are looking for is "seoFriendly".

What I used last night for partial automation is this:

cat showUrls | while read url ; do curl -s $url | grep seoFri >> episodes; done

(showUrls is a list of URLs like http://www.cartoonnetwork.com/video/justice-league-action/episodes/index.html)

split -l 1 episodes (Put each json block in a separate file).

vim x??

Then, the following commands. NB: I don't know how to use 'sed' to insert a newline, or I'd have this whole thing in a shell script.

:s/","/",^M"/g -- break the json into lines
:1,$!grep seo -- filter out the key/values that we want
:g,^.*/vid,s,,http://www.cartoonnetwork.com/vid -- remove the "key" and fix up the URL
:g/"},{.*/s/// -- remove the end of line junk
:$s,"}];,, -- special end of line junk for the last line.
:wn -- next file

This gives me one playlist file per show, which I then shove into a per-show directory, and run youtube-dl on.

siddht4 commented 7 years ago

@keybounce playlist support is an lengthy process as the extractor needs to be updated as well a crawler need to be added which itself becomes an extensive process. For now use the process you just mentioned
As far as I see a simple fix would to the parse the content of url as cat does.(extractor) Put all the url to list,(extractor) and do the necessarry regex :s/","/",^M"/g -- break the json into lines :1,$!grep seo -- filter out the key/values that we want :g,^.*/vid,s,,http://www.cartoonnetwork.com/vid -- remove the "key" and fix up the URL :g/"},{.*/s/// -- remove the end of line junk :$s,"}];,, -- special end of line junk for the last line. :wn -- next file
as you mentioned.

What I provided was just a quick alternative solution to your problem.You can still repurpose the script as per your liking.If problem is relating my script put a issue there for everything else put here.

I don't know how to use 'sed' to insert a newline

sed ':a;N;$!ba;s// /g\n' file is your friend

siddht4 commented 7 years ago

@keybounce

Sadly, even after looking over that github, I cannot really figure out how to use it. Other than as an example of calling youtube-dl from another program.

can you point out what you were not being able to figure out there and how else to reach you other than this issue https://github.com/rg3/youtube-dl/issues/13578. So that I can prepare it likewise.

keybounce commented 7 years ago

How to reach me: keybounce@gmail.com

... I went back over this thread, to find your link to your github, and it's now gone :-)

I have read the official "how to embed youtube-dl in another python program", and it makes enough sense. I just don't know python (never programmed in it, but general reading of code is general reading of code)

... and that sed statement ... if I'm reading it correctly (not sure that I am), it says:

  1. Label a.
  2. Read the next line into the pattern buffer
  3. Until the last line, loop back (read whole file into memory)
  4. ... and a substitute command I don't understand.

Still, reading the sed page for the ... I've lost track of how many times, I see this now (in the s command):

         A line can be split by substituting a newline character into it.  To specify a newline character in the replacement string, precede it with a backslash.

... so a \ would normally be interpreted by bash as "just combine these lines", and now I have to figure out how to tell bash to leave the newline in and combine... fun.

siddht4 commented 7 years ago

@keybounce https://github.com/siddht4/youtube_dl_embed/ is the link,anyways as you read the official embed documents too you have got the idea.

\n is actually meant to be a newline characters as followed by most programming language too.Sort of a replace function going on there. As per your previous query you had to insert a newline,and now you need to remove that. So I can safely say you sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' file. Necessary documentation here https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed

Reading this topic again made me realize we have diverged too much,so if you succed do the necessary regex and pull request.

keybounce commented 7 years ago

Ahh. No, I need to insert a (many) newline -- I need to take one super long line, and break it up into multiple short lines. That was what I did not know how to do in sed before, but it looks like the answer is to substitute a \n in the middle.

Of course, the docs explaining that are split into four different locations in my sed man page (sed functions: " To embed a newline in the text, precede it with a backslash. ". Sed regular expressions: " You cannot, however, use a literal newline character in an address or in the substitute command." The "s" command: "A line can be split by substituting a newline character into it. To specify a newline character in the replacement string, precede it with a backslash." And the "y" command: "a backslash followed by an ``n'' is replaced by a newline character.").

keybounce commented 7 years ago

So here is my working cartoon network playlist extraction.

#!/bin/bash
# Take a (single) cartoon network url as argument. Output a list of episodes
# Does not output the clips, if any, only the episodes.

url="$@"        # Should only be $1, but in case they change url formats ...

# echo :"$url": >&2

# Embedded newline

## This version only gets the episodes
# curl -s -S "$url" | sed -n -e '/getFullEpisodes/,/return/ s/","/",\
# "/gp  ' | grep seoFriendly | sed -e 's,^.*/vid,http://www.cartoonnetwork.com/vid,' -e 's/"}.*//'

## This version gets the episodes and the clips
curl -s -S "$url" | grep seoFriendly | sed -n -e 's/","/",\
"/gp  ' | grep seoFriendly | sed -e 's,^.*/vid,http://www.cartoonnetwork.com/vid,' -e 's/"}.*//'
siddht4 commented 7 years ago

okay the script looks okay to me,may need some changes , cheers