soraxas / echo360

Commandline tool for automated downloads of echo360 videos hosted by university
https://cs.tinyiu.com/echo360
MIT License
276 stars 52 forks source link

Option to download via RTMP #18

Open RenWal opened 4 years ago

RenWal commented 4 years ago

Hi,

I've just stumbled upon the fact that (at lest at my university) echo360 has an RTMP stream with much higher quality available (720p instead of the 360p I get using this downloader), which can be derived from the video source URL that is extracted here:

https://github.com/soraxas/echo360/blob/8d19dae60edc5646b56b1fdfd82c11562ea21fb7/echo360/videos.py#L83

On our instance, the URL looks like this:

http://[...]/echo/_definst_/1815/5/35518f02-d94c-451a-8b0f-b8c84bdb0db7/mp4:audio-video-streamable.m4v/playlist.m3u8

The RTMP stream is derived like so:

rtmp://[...]/echo/_definst_/1815/5/35518f02-d94c-451a-8b0f-b8c84bdb0db7/audio-video.flv

Using that stream, the video can be downloaded e.g with ffmpeg, which the downloader is using anyway. The downside is that, since it's RTMP, the download time is exactly the video run time. One can work around this a little bit by downloading all selected lectures simultaneously.

Do you think this is a feature worth adding?

soraxas commented 4 years ago

Hi @RenWal

YES that sounds like a very beneficial thing to have---to retrieve the higher quality video.

Are there different tags that associated with different streaming quality? Normally RTSP provides different m3u8 files for the client to choose different quality. I think a quick addition would be always tries to retrieve higher quality m3u8, then fallbacks to the normal ones.

Of course, an option for the user to choose from would be nice; however, probably not necessary as most people wouldn't prefer lower stream rate when downloading a video.

Would you be able to send in a PR for that?

RenWal commented 4 years ago

Good point about the different m3u8 playlists. I did figure out today that you can force the video server (which is the Wowza Streaming Engine) to give you a chunklist to flv files instead of mp4, but that chunklist then contains no files.

The Flash version somehow manages to get to a seekable version of the stream, whereas I have been unsuccessful trying to jump to other timestamps in the RTMP stream using VLC (which would be nice because at least we could download one course in many segments in parallel), but of course I can't inspect the flash player in the browser. I'll see if I can tcpdump some of its network interactions to see where it gets the files from. Maybe that will help finding some approach that works without using the RTMP method and still gets the high quality files.

RenWal commented 4 years ago

Alright, the Flash version does seem to just use the RTMP stream instead of the m3u8 playlist based chunk method. The stream can properly be seeked though, it's only VLC that doesn't like the stream. Using ffplay this works flawlessly.

Some codec stats from ffplay:

Input #0, flv
encoder    : Lavf55.12.100
Stream #0:0: Data: none
Stream #0:1: Video: h264 (High), yuv420p(progressive), 1280x720, 1998 kb/s, 25 fps, 25 tbr, 1k tbn, 50 tbc
Stream #0:2: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s

The fastest I could make the server send the video using ffmpeg is about 1.4x and that only for several seconds before settling at 1.0x, so to download the file quickly one would need to:

  1. open multiple RTMP streams to the same video
  2. seek them to different positions, e.g. each 10% further
  3. download all of those "chunks" in parallel using multiple invocations of ffmpeg
  4. concatenate the chunks

That concatenation would require a full demux-remux cycle using the ffmpeg concat demuxer. I haven't (yet) found a way to simply concatenate the chunks on file level (ffplay doesn't like the result). However, the demux-remux pass is virtually only limited by HDD speed on any halfway decent CPU. It does need some disk space though, because for a short time both the chunks and the concatenated version exist.

soraxas commented 4 years ago

That sounds very interesting. At the current state, this module uses the iPad user-agent string in the selenium webdriver. This forces the webserver to send in m3u8 playlist of the videos, because back in the days when I was making the script, it was much easier to deal with m3u8-based video files downloading.

Regarding to ffmpeg, this module current is only using it for converting .ts file into .mp4 file. And if the user does not has ffmpeg installed in their system this step will be skipped. The concatenating part is simply using the very raw method of appending contents into a master file.

I myself am not too familiar with RTMP and how to combine multiple chunks together. I had a look on other existing RTMP module, and found this which seems to have a update_buffer parameter that tries to buffer the whole stream. I don't currently have access to any active echo360 course that I can test it out. Do you want to try if it helps (in terms of downloading time)?