whipper-team / whipper

Python CD-DA ripper preferring accuracy over speed
GNU General Public License v3.0
1.14k stars 90 forks source link

Add option to replace forbidden characters with unicode equivalent #616

Open hadess opened 5 months ago

hadess commented 5 months ago

I use whipper to rip CDs on a Linux system, which I then consume from my NAS over SMB, so directory and file names created shouldn't have any Unix or FAT32 forbidden characters.

It would be great if there was an option to replace forbidden characters with look-alike Unicode characters.

For example: / with ∕ or ⁄ (0x2f slash with fraction slash or division slash) : with ∶ (0x58 with ratio) etc.

I believe those 2 would cover most of my own rips.

hadess commented 5 months ago

A bit of a stretch: ? with ⹔

  1. Offspring - What Happened to You?.flac
  2. Offspring - What Happened to You⹔.flac
hadess commented 5 months ago

... with which FAT32/SMB doesn't like ending file or directory names without an extension

MerlijnWajer commented 5 months ago

I suppose we could provide additional arguments to the getPath function in whipper/common/program.py, where it performs certain substitutions (passed by argument) at the end, that might be enough.

texneus commented 1 month ago

This is part of a bash script I made to process rips that, among other things, was indented to deal with this plus a few things that just annoy me. I targeted CIFS illegal characters (not FAT32) since my files are stored on Linux and shared with SAMBA. I assume since whipper runs in Linux that illegal Linux file names are already dealt with. Any specifics for FAT would need to be added.

You are more than welcome to adapt it to your use case. Script assumes MUSIC/ARTIST/ALBUM directory structure and is run from the MUSIC level (i.e. the directory containing the ARTIST folders). Directories are renamed first, then files.

CAUTION: This script is potentially dangerous as it will modify your files! Use only if you understand what it is doing and you agree to what it does! If executed as-is, this will only show what files will be changed. Remove the 'echo' command from the final if block to actually have it work.

#!/bin/bash

echo "Renaming files to avoid illegal CIFS names."
for I in d f; do
    while IFS= read -r FULLPATH; do
        FILEORG=$(basename "$FULLPATH")
        FILEDIR=$(dirname "$FULLPATH")

        #Remove without replacement "Control" characters 0x00-0x1F and 0x7f
        FILENEW=$(echo $FILEORG | sed -e 's/[\x00-\x1F\x7f]//g')

        #Remove without replacement specifically not allowed characters: */<>?\|
        # NOTE: All characters are referred to by hex codes since several interfere with SED or BASH scripts
        FILENEW=$(echo $FILENEW | sed -e 's/[\x2a\x2f\x3c\x3e\x3f\x5c\x7c]//g')

        #Remove without replacement all forms of double quotation marks
        # NOTE: Double quotes are not allowed so the smart equivalents cannot be converted, so just get rid of them all.
        FILENEW=$(echo $FILENEW | sed -e 's/["“”]//g')

        #Substitute odd forms of single quotes to actual single quotes
        FILENEW=$(echo $FILENEW | sed -e 's/\[‘’`]/\x27/g')

        #Substitute : with ' -' (space dash)
        FILENEW=$(echo $FILENEW | sed -e 's/\:/ -/g')

        #Eliminate leading/trailing spaces and periods
        FILENEW=$(echo $FILENEW | sed -e 's/^[. ]*//; s/[ .]*$//')

        if [[ "$FILEORG" != "$FILENEW" ]]; then
            #Remove 'echo' ONLY AFTER you are certain this script does what you want!
            echo mv "$FULLPATH" "$FILEDIR/$FILENEW"
        fi
    done < <(find . -type $I -not -path ".")
done