zmwangx / caterpillar

Hardened HLS merger
MIT License
51 stars 9 forks source link
ffmpeg hls http-live-streaming m3u8

Caterpillar
Caterpillar

pypi python: 3.6, 3.7 license: MIT Build Status

caterpillar is a hardened HLS merger. It takes an HTTP Live Streaming VOD URL (typically an .m3u8 URL), downloads the video segments, and attempts to merge them into a single, coherent file. It is specially designed to combat timestamp discontinuities (symptom: a naive FFmpeg run spews tons of "Non-monotonous DTS in output stream" warning messages and ends up with a useless file with completely broken timestamps).

caterpillar supports up to version 3 of the HTTP Live Streaming protocol (VOD only; non-VOD playlists are treated as VOD, and may result in unexpected consequences).

Contents

Dependencies

A recent version of FFmpeg. FFmpeg 3.3.4 is known to work with caterpillar; FFmpeg 3.2.4 is known to NOT work.

Installation

Python 3.6 or later is required.

If in doubt, check out the detailed "Installation Guide for Novices".

For end users

To install,

pip install caterpillar-hls

To upgrade to the latest version,

pip install -U caterpillar-hls

For developers and beta testers

To install from the master branch,

git clone https://github.com/zmwangx/caterpillar.git
cd caterpillar
python setup.py develop
caterpillar -h

To update to the latest master,

cd /path/to/caterpillar
git pull origin master

For application developers

Short of calling caterpillar.caterpillar.main with sys.argv set appropriately, you can access caterpillar's functionality through caterpillar.caterpillar.process_entry and caterpillar.caterpillar.process_batch. Warning: there's no stability guarantee to these interfaces, although I won't break compatibility without a very compelling reason.

process_entry and process_batch additionally support event hooks (a feature not exposed to end users). See caterpillar.caterpillar.events for types of events emitted and associated data attributes.

Usage

$ caterpillar -h
usage: caterpillar [-h] [-b] [-e] [-f] [-j JOBS] [-k]
                   [-m {concat_demuxer,concat_protocol,0,1}] [-r RETRIES]
                   [--remove-manifest-on-success] [--workdir WORKDIR]
                   [--workroot WORKROOT] [--wipe] [-v] [--progress]
                   [--no-progress] [-q] [--debug] [-V]
                   m3u8_url [output]

positional arguments:
  m3u8_url              the VOD URL, or the batch mode manifest file
  output                path to the final output file (default is a .ts file
                        in the current directory with the basename of the VOD
                        URL)

optional arguments:
  -h, --help            show this help message and exit
  -b, --batch           run in batch mode (see the "Batch Mode" section in
                        docs)
  -e, --exist-ok        skip existing targets (only works in batch mode)
  -f, --force           overwrite the output file if it already exists
  -j JOBS, --jobs JOBS  maximum number of concurrent downloads (default is
                        twice the number of CPU cores, including virtual
                        cores)
  -k, --keep            keep intermediate files even after a successful merge
  -m {concat_demuxer,concat_protocol,0,1}, --concat-method {concat_demuxer,concat_protocol,0,1}
                        method for concatenating intermediate files (default
                        is 'concat_demuxer'); see
                        https://github.com/zmwangx/caterpillar/#notes-and-limitations
                        for details
  -r RETRIES, --retries RETRIES
                        number of times to retry when a possibly recoverable
                        error (e.g. download issue) occurs; default is 2, and
                        0 turns off retries
  --remove-manifest-on-success
                        remove manifest file if all downloads are successful
                        (only works in batch mode)
  --workdir WORKDIR     working directory to store downloaded segments and
                        other intermediate files (default is automatically
                        determined based on URL and output file)
  --workroot WORKROOT   if nonempty, this path is used as the root directory
                        for all processing, under which both the working
                        directory and final destination are mapped; after
                        merging is done, the artifact is eventually moved to
                        the destination (use cases: destination on a slow HDD
                        with workroot on a fast SSD; destination on a
                        networked drive with workroot on a local drive)
  --wipe                wipe all downloaded files (if any) and start over
  -v, --verbose         increase logging verbosity (can be specified multiple
                        times)
  --progress            show download progress bar regardless of verbosity
                        level
  --no-progress         suppress download progress bar regardless of verbosity
                        level
  -q, --quiet           decrease logging verbosity (can be specified multiple
                        times)
  --debug               output debugging information (also implies highest
                        verbosity)
  -V, --version         show program's version number and exit

environment variables:
  CATERPILLAR_USER_CONFIG_DIR
                        custom directory for caterpillar.conf
  CATERPILLAR_USER_DATA_DIR
                        custom directory for certain data cached by
                        caterpillar
  CATERPILLAR_NO_USER_CONFIG
                        when set to a non-empty value, do not load
                        options from user config file
  CATERPILLAR_NO_CACHE  when set to a non-empty value, do not read or
                        write caterpillar's cache

configuration file:
  <an operating system and user-dependent path>

See the wiki page for usage examples.

Batch mode

In normal mode, caterpillar deals with only one stream. There is also a batch mode for downloading multiple streams at once. In this mode, you specify a manifest file on the command line in the place of the VOD URL, where the manifest file contains a VOD URL and a filename (or path) seperated by a tab on each line, e.g., caterpillar manifest.txt, where manifest.txt contains

https://example.com/hls/1.m3u8  1.mp4
https://example.com/hls/2.m3u8  2.mp4
https://example.com/hls/3.m3u8  3.mp4

The filenames (or paths) are relative to the parent directory of the manifest file. The tab character is not allowed in the filenames (or paths).

Comments that start with # are allowed in the manifest file.

Most options for normal mode are also allowed in the batch mode, as are options set in the configuration file.

Configuration

To save some retyping, caterpillar supports the configuration of default options in an operating system and user-dependent configuration file. The path is usually ~/Library/Application Support/caterpillar/caterpillar.conf on macOS, %AppData%\org.zhimingwang\caterpillar\caterpillar.conf on Windows, and ~/.config/caterpillar/caterpillar.conf on Linux. Run caterpillar -h to view the actual path.

The syntax of the configuration file is documented in the template (automatically created for you if possible), duplicated below:

# You may configure default options here so that you don't need to
# specify the same options on the command line every time.
#
# Each option, along with its argument (if any), should be on a separate
# line; unlike on the command line, you don't need to quote or escape
# whitespace or other special characters in an argument, e.g., a line
#
#     --workdir Temporary Directory
#
# is interpreted as two command line arguments "--workdir" and
# "Temporary Directory".
#
# Positional arguments are not allowed, i.e., option lines must begin
# with -.
#
# Blank lines and lines starting with a pound (#) are ignored.
#
# You can always override the default options here on the command line.
#
# Examples:
#
#     --jobs 32
#     --concat-method concat_protocol

Notes and limitations

Etymology

The word "caterpillar" starts with cat(1), and the body of a caterpillar is segmented.

Copyright

Copyright © 2017 Zhiming Wang

This project is licensed under the MIT license. See COPYING for details.