nathom / streamrip

A scriptable music downloader for Qobuz, Tidal, SoundCloud, and Deezer
GNU General Public License v3.0
2.34k stars 209 forks source link

[FEATURE] set MD5 checksum of Qobuz sourced FLAC files post-rip #705

Open 999wqe9q9ewq9 opened 2 weeks ago

999wqe9q9ewq9 commented 2 weeks ago

Is the feature request related to a problem. Please describe it.

Qobuz sourced FLAC files do not have their MD5 signature set in STREAMINFO. This is not a fault of streamrip but of Qobuz.

When testing flac files sourced from qobuz using flac -t you will see that such files report back "WARNING, cannot check MD5 signature since it was unset in the STREAMINFO"

This does not affect playback however, it's strictly missing the md5 which can be obtained by decoding/encoding the raw audio data.

Describe the solution you would like.

Streamrip performs either a re-encode of each flac file (from qobuz) post-rip using a command such as flac -f8 *.flac to set the MD5 checksum or, only decodes each file, calculates the MD5 signature from the raw audio and sets it.

An existing python script requiring flac and mutagen can be used as a reference point for how one decodes the raw audio and sets the MD5 without having to re-encode. This process is slightly faster than re-encoding the files in parallel but either way works as long as the end result is that the md5 checksum is set after a qobuz rip finishes.

import sys
import os
import logging
import subprocess as sp
import argparse
from multiprocessing import pool
from hashlib import md5
from mutagen import flac

# edit this:
FLAC_PROG = "P:\\Path\to\\flac.exe"
# --------------------

logger = logging.getLogger(__name__)
CHUNK_SIZE = 512 * 1024

def scantree(path: str, recursive=False):
    for entry in os.scandir(path):
        if entry.is_dir():
            if recursive:
                yield from scantree(entry.path, recursive)
        else:
            yield entry

def get_flac(path: str):
    try:
        return flac.FLAC(path)
    except flac.FLACNoHeaderError:  # file is not flac
        return
    except flac.error as e:  # file < 4 bytes
        if str(e).startswith('file said 4 bytes'):
            return
        else:
            raise e

def get_flacs_no_md5(path: str, recursive=False):
    for entry in scantree(path, recursive):
        flac_thing = get_flac(entry.path)
        if flac_thing is not None and flac_thing.info.md5_signature == 0:
            yield flac_thing

def get_md5(flac_path: str) -> str:
    md_five = md5()
    with sp.Popen(
            [FLAC_PROG, '-ds', '--stdout', '--force-raw-format', '--endian=little', '--sign=signed', flac_path],
            stdout=sp.PIPE,
            stderr=sp.DEVNULL) as decoding:
        for chunk in iter(lambda: decoding.stdout.read(CHUNK_SIZE), b''):
            md_five.update(chunk)

    return md_five.hexdigest()

def set_md5(flac_thing: flac.FLAC):
    md5_hex = get_md5(flac_thing.filename)
    flac_thing.info.md5_signature = int(md5_hex, 16)
    flac_thing.tags.vendor = 'MD5 added'
    flac_thing.save()
    return flac_thing

def main(path: str, recursive=False, check_only=False):
    found = False
    if check_only:
        for flac_thing in get_flacs_no_md5(path, recursive=recursive):
            logger.info(flac_thing.filename)
            found = True
    else:
        with pool.ThreadPool() as tpool:
            for flac_thing in tpool.imap(set_md5, get_flacs_no_md5(path, recursive=recursive)):
                logger.info(f'MD5 added: {flac_thing.filename}')
                found = True
    if not found:
        logger.info('No flacs without MD5 found')

def parse_args():
    parser = argparse.ArgumentParser(prog='Add MD5')
    parser.add_argument('dirpath')
    parser.add_argument('-r', '--recursive', help='Include subdirs', action='store_true')
    parser.add_argument('-c', '--check_only', help='don\'t add MD5s, just print the flacs that don\'t have them.',
                        action='store_true')
    args = parser.parse_args()

    return args.dirpath, args.recursive, args.check_only

if __name__ == '__main__':
    logger.setLevel(10)
    logger.addHandler(logging.StreamHandler(stream=sys.stdout))
    main(*parse_args())
usage: script_name  [-h] [-r] [-c] dirpath

positional arguments:
  dirpath

options:
  -h, --help        show this help message and exit
  -r, --recursive   Include subdirs
  -c, --check_only  don't add MD5s, just print the flacs that don't have them.

Describe alternatives you've considered.

You can write a wrapper to do this instead but having it be a flag or perhaps default behavior for sourcing from Qobuz may be beneficial to others