rakuri255 / UltraSinger

AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
MIT License
230 stars 19 forks source link

UltraSinger uppercase letter sentencing and batch GUI's #117

Open 1Larsolof opened 5 months ago

1Larsolof commented 5 months ago

I've been having issues with weird sentencing results from UltraSinger where if there are too many words together, where it makes it all one sentence.

I've "made" a really simple python script using tkinter for a gui simply to check if there is a line with a capital letter and if there is a "-" already placed a couple of lines above, if there is it won't add a break but if there isn't it will. There is also an option to delete every line with a "-" so it can start out fresh. You can input multiple files at the same time and all modified files will be saved as "Modified_name.txt" (EDIT: Forgot to mention it get's the timing from the previous start time + Duration)

The batch gui simply runs the command "scr/ultrasinger.py" -i -o and waits for end promt. make sure to place it above the src folder.

Cap and Batch.zip

CAP2:

#Cap V2
import tkinter as tk
from tkinter import filedialog, messagebox
from pathlib import Path

def calculate_time_variable(line):
    _, time_info = line.split(" ", 1)
    time_parts = time_info.split()
    if len(time_parts) >= 2:
        start_time, duration = map(int, time_parts[:2])
        return start_time + duration
    return 0  # Return 0 if there are not enough values

def process_file(file_path, delete_lines_flag):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    modified_lines = []
    check_next_line = False
    last_hyphen_index = -3  # Initialize to a value that won't interfere with the first check

    for i, line in enumerate(lines):
        if delete_lines_flag and line.startswith('-'):
            continue  # Skip lines starting with "-" if the checkbox is selected

        if check_next_line and '-' not in line and i - last_hyphen_index >= 3:
            if any(char.isupper() for char in line.strip()):
                modified_lines.append('- ' + str(calculate_time_variable(modified_lines[-1])) + "\n")
                last_hyphen_index = i

        modified_lines.append(line)
        check_next_line = False

        if line.startswith(':'):
            check_next_line = True

    output_path = Path(file_path).parent / ('Modified_' + Path(file_path).name)
    with open(output_path, 'w') as modified_file:
        modified_file.write(''.join(modified_lines))

    return output_path

def browse_files():
    file_paths = filedialog.askopenfilenames(filetypes=[("Text files", "*.txt")])
    for file_path in file_paths:
        files_listbox.insert(tk.END, file_path)

def show_notification(output_path):
    messagebox.showinfo("Processing Completed", f"Processing completed. Output file: {output_path}")

def process_files():
    selected_files = files_listbox.get(0, tk.END)
    delete_lines_flag = delete_lines_var.get()  # Get the state of the checkbox

    for file_path in selected_files:
        output_path = process_file(file_path, delete_lines_flag)
        show_notification(output_path)

# GUI setup
root = tk.Tk()
root.title("Karaoke Song Processor")

# Listbox to display selected files
files_listbox = tk.Listbox(root, selectmode=tk.MULTIPLE, width=50)
files_listbox.pack(pady=10)

# Checkbox to enable or disable deletion of lines starting with "-"
delete_lines_var = tk.BooleanVar()
delete_lines_checkbox = tk.Checkbutton(root, text="Delete lines starting with '-'", variable=delete_lines_var)
delete_lines_checkbox.pack()

# Browse button to select files
browse_button = tk.Button(root, text="Browse Files", command=browse_files)
browse_button.pack()

# Process button to modify files
process_button = tk.Button(root, text="Process Files", command=process_files)
process_button.pack(pady=10)

# Start the Tkinter event loop
root.mainloop()

image

Batch:

#LiteGUIbatch V3
import tkinter as tk
from tkinter import filedialog, ttk, messagebox
import os
import subprocess
import threading

def run_ultra_singer():
    def process_files():
        input_files = input_file_text.get("1.0", tk.END).splitlines()
        output_folder = output_folder_entry.get()

        # Configure the determinate progress bar
        progress_bar["value"] = 0
        progress_bar["maximum"] = len(input_files)

        # Configure the indeterminate progress bar
        indeterminate_progress_bar.start()

        for index, input_file in enumerate(input_files, start=1):
            command = [
                "python3",
                "src/UltraSinger.py",
                "-i",
                input_file,
                "-o",
                output_folder
            ]

            process = subprocess.Popen(command, shell=False)
            process.wait()

            # Remove the processed file from the text box
            input_file_text.delete("1.0", "2.0")

            # Update the determinate progress bar
            progress_bar.step(1)

            # Update the percentage label
            percentage_text.set(f"{(index / len(input_files)) * 100:.2f}%")

        # All files processed, stop both progress bars
        indeterminate_progress_bar.stop()
        progress_bar.stop()

        # Reset the percentage label
        percentage_text.set("")

        # Show notification
        show_notification("Processing Complete", "All files have been processed.")

    # Create a separate thread for file processing
    processing_thread = threading.Thread(target=process_files)
    processing_thread.start()

def show_notification(title, message):
    messagebox.showinfo(title, message)

def browse_input_file():
    file_paths = filedialog.askopenfilenames(
        initialdir=os.getcwd(),
        title="Select Input Files",
        filetypes=(("Audio Files", ".mp3 .wav .ogg"),)
    )

    for file_path in file_paths:
        input_file_text.insert(tk.END, file_path + "\n")

def browse_output_folder():
    folder_path = filedialog.askdirectory(
        initialdir=os.getcwd(),
        title="Select Output Folder"
    )
    output_folder_entry.delete(0, tk.END)
    output_folder_entry.insert(0, folder_path)

root = tk.Tk()
root.title("UltraSinger GUI Lite Batch")

# Set the window size with a 16:10 aspect ratio
window_width = 800
window_height = int(window_width * 10 / 16)
root.geometry(f"{window_width}x{window_height}")

# Determinate Progress Bar
progress_bar = ttk.Progressbar(root, mode="determinate", length=780)
progress_bar.grid(row=0, column=0, padx=10, pady=10, columnspan=3, sticky="w")  # Adjusted row

# Percentage Label
Percentageadd_label = tk.Label(root, 
    text="              Finnished", 
    font=("Helvetica", 16)
)
Percentageadd_label.grid(row=1, column=1, padx=0, pady=10, sticky="w")
percentage_text = tk.StringVar()
percentage_text.set("0.00%")
percentage_label = tk.Label(
    root,
    textvariable=percentage_text,
    font=("Helvetica", 16),  # Change the font size and family as needed
    width=5  # Adjust width to center the label
)
percentage_label.grid(row=1, column=1, padx=10, pady=10, sticky="w")  # Adjusted properties

# Input File
input_file_label = tk.Label(root, text="Input Files/URLs:")
input_file_label.grid(row=2, column=0, padx=10, pady=5, sticky="nw")
input_file_text = tk.Text(root, wrap="none", height=20, width=59)
input_file_text.grid(row=2, column=1, padx=0, pady=0, sticky="w")
browse_input_button = tk.Button(root, text="Browse", command=browse_input_file, width=15, height=4)
browse_input_button.grid(row=2, column=2, padx=30, pady=0, sticky="w")

# Output Folder
output_folder_label = tk.Label(root, text="Output Folder:")
output_folder_label.grid(row=3, column=0, padx=10, pady=5, sticky="w")
output_folder_entry = tk.Entry(root, width=45)
output_folder_entry.grid(row=3, column=1, padx=0, pady=0, sticky="w")
browse_output_button = tk.Button(root, text="Browse", command=browse_output_folder, width=15, height=4)
browse_output_button.grid(row=3, column=2, padx=30, pady=20, sticky="w")

# Indeterminate Progress Bar
indeterminate_progress_bar = ttk.Progressbar(root, mode="indeterminate", length=500)
indeterminate_progress_bar.grid(row=4, column=0, padx=0, pady=0, columnspan=2, sticky="ns")  # Adjusted row

# Run Button
run_button = tk.Button(root, text="Run UltraSinger", command=run_ultra_singer, width=20, height=2)
run_button.grid(row=4, column=2, padx=30, pady=0, columnspan=2, sticky="e")  # Adjusted row

# Start the Tkinter main loop
root.mainloop()

image

rakuri255 commented 5 months ago

Thanks for gui and your work!

Can you give an example to better understand the problem? For example, you could add an snippet from the plot. Just use --plot Ture

1Larsolof commented 5 months ago

Thanks for gui and your work!

Can you give an example to better understand the problem? For example, you could add an snippet from the plot. Just use --plot Ture

01 - Lace It.txt plot [https://drive.google.com/file/d/163ZLa46EB27Rfss4Q7PDYlT8_Pve1EFm/view?usp=sharing](url)

I'm using an m1 mac by the way

1Larsolof commented 5 months ago

Also here's the output from the console:

(ultrasinger) albinandreasson@albins-MBP UltraSinger % python3 src/ultrasinger.py -i input/"01 - Lace It.mp3" --plot True [UltraSinger] 0:00:05.131 - Initialized... /Users/albinandreasson/anaconda3/envs/ultrasinger/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") /Users/albinandreasson/anaconda3/envs/ultrasinger/lib/python3.10/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") torchvision is not available - cannot save figures

[UltraSinger] [UltraSinger] UltraSinger Version: 0.0.3 [UltraSinger] [UltraSinger] Checking GPU support for tensorflow and pytorch. [UltraSinger] tensorflow - there are no cuda devices available -> Using cpu. [UltraSinger] pytorch - there are no cuda devices available -> Using cpu. [UltraSinger] full automatic mode [UltraSinger] Searching song in musicbrainz [UltraSinger] cant find title lace in 01 it [UltraSinger] No match found [UltraSinger] Creating output folder. -> input/output/01 - Lace It [UltraSinger] Creating output folder. -> input/output/01 - Lace It/cache [UltraSinger] Separating vocals from audio with demucs and cpu as worker. Important: the default model was recently changed to htdemucs the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q. Selected model is a bag of 1 models. You will see that many progress bars per track. Separated tracks will be stored in /Users/albinandreasson/Documents/Melody Mania Songs/UltraSinger/separated/htdemucs Separating track input/output/01 - Lace It/01 - Lace It.mp3 0%| | 0.0/ 3%|█▌ | 5.85/222.2999 5%|███▏ | 11.7/222.2999 8%|███▋ | 17.549999999999997/222.2999 11%|██████▎ | 23.4/222.2999 13%|███████▊ | 29.25/222.2999 16%|███████▎ | 35.099999999999994/222.2999 18%|████████▍ | 40.949999999999996/222.2999 21%|████████████▋ | 46.8/222.2999 24%|█████████████▉ | 52.65/222.2999 26%|███████████████▊ | 58.5/222.2999 29%|█████████████████ | 64.35/222.2999 32%|██████████████▊ | 70.19999999999999/222.2999 34%|████████████████████▏ | 76.05/222.2999 37%|█████████████████▎ | 81.89999999999999/222.2999 39%|███████████████████████▎ | 87.75/222.2999 42%|█████████████████████████▎ | 93.6/222.2999 45%|█████████████████████ | 99.44999999999999/222.2999 47%|███████████████████████████▉ | 105.3/222.2999 50%|███████████████████████ | 111.14999999999999/222.2999 53%|███████████████████████████████ | 117.0/222.2999 55%|████████████████████████████████ | 122.85/222.2999 58%|██████████████████████████████████▏ | 128.7/222.2999 61%|███████████████████████████▊ | 134.54999999999998/222.2999 63%|█████████████████████████████ | 140.39999999999998/222.2999 66%|██████████████████████████████████████▏ | 146.25/222.2999 68%|████████████████████████████████████████▎ | 152.1/222.2999 71%|█████████████████████████████████████████▏ | 157.95/222.2999 74%|█████████████████████████████████▉ | 163.79999999999998/222.2999 76%|███████████████████████████████████ | 169.64999999999998/222.2999 79%|██████████████████████████████████████████████▌ | 175.5/222.2999 82%|███████████████████████████████████████████████▎ | 181.35/222.2999 84%|█████████████████████████████████████████████████▋ | 187.2/222.2999 87%|███████████████████████████████████████▉ | 193.04999999999998/222.2999 89%|█████████████████████████████████████████▏ | 198.89999999999998/222.2999 92%|█████████████████████████████████████████████████████▍ | 204.75/222.2999 95%|███████████████████████████████████████████████████████▉ | 210.6/222.2999 97%|████████████████████████████████████████████████████████▍ | 216.45/222.2999100%|██████████████████████████████████████████████| 222.29999999999998/222.2999100%|██████████████████████████████████████████████| 222.29999999999998/222.29999999999998 [01:48<00:00, 2.05seconds/s] [UltraSinger] Converting audio for AI [UltraSinger] Reduce noise from vocal audio with ffmpeg. [UltraSinger] Loading whisper with model large-v2 and cpu as worker No language specified, language will be first be detected for each audio file (increases inference time). Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.2. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../.cache/torch/whisperx-vad-segmentation.bin Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.1.1. Bad things might happen unless you revert torch to 1.x. [UltraSinger] Transcribing input/output/01 - Lace It/cache/01 - Lace It_denoised.wav Detected language: en (0.94) in first 30s of audio... [UltraSinger] Removing silent start and ending, from transcription data [UltraSinger] Hyphenate using language code: en_IL 662it [00:00, 536132.31it/s] [UltraSinger] Pitching with crepe and model full and cpu as worker 2024-01-13 11:52:31.766392: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 680/680 [==============================] - 143s 211ms/step [UltraSinger] Creating midi notes from pitched data [UltraSinger] Creating Ultrastar notes from midi data [UltraSinger] Creating plot: Spectrogram [UltraSinger] Creating plot [UltraSinger] BPM is 107.67 [UltraSinger] Creating input/output/01 - Lace It/01 - Lace It.txt from transcription. [UltraSinger] Converting wav to mp3 [UltraSinger] Creating input/output/01 - Lace It/01 - Lace It [Karaoke].txt from transcription. [UltraSinger] Parse ultrastar txt -> input/output/01 - Lace It/01 - Lace It.txt [UltraSinger] Calculating Ultrastar Points [UltraSinger] Simple (octave high ignored) points [UltraSinger] Total: 7911, notes: 7157, line bonus: 754, golden notes: 0 [UltraSinger] Accurate (octave high matches) points: [UltraSinger] Total: 7884, notes: 7134, line bonus: 750, golden notes: 0 [UltraSinger] Creating Midi with pretty_midi [UltraSinger] Creating midi instrument from Ultrastar txt [UltraSinger] Creating midi file -> input/output/01 - Lace It/01 - Lace It.mid

[UltraSinger] Do you like UltraSinger? Want it to be even better? Then help with your support! [UltraSinger] See project page -> https://github.com/rakuri255/UltraSinger [UltraSinger] This will help a lot to keep this project alive and improved. [UltraSinger] 0:27:56.069 - End Program (ultrasinger) albinandreasson@albins-MBP UltraSinger %

rakuri255 commented 5 months ago

You are using an old version. Your are on 0.0.3 while the source is on 0.0.8. Please update your sources.

rakuri255 commented 5 months ago

We already have an issue regarding too many words #45 . The problem is a little more complicated here. Yes, you could look for capitalization in English, but what about Asian language, where there are no capital letters.

It might be better to check the total duration of the sentence. In other words, specify a fixed maximum duration until when a separation should take place. Then you need also have to check the word times, i.e. Rap has less duration for each word than Pop, in order to scale the number of words.

1Larsolof commented 5 months ago

You are using an old version. Your are on 0.0.3 while the source is on 0.0.8. Please update your sources.

Done, sorry. just started looking at this recently again

We already have an issue regarding too many words #45 . The problem is a little more complicated here. Yes, you could look for capitalization in English, but what about Asian language, where there are no capital letters.

It might be better to check the total duration of the sentence. In other words, specify a fixed maximum duration until when a separation should take place. Then you need also have to check the word times, i.e. Rap has less duration for each word than Pop, in order to scale the number of words.

Yes, i get that. this was just a quick and dirty fix to make easier editing of the song in other softwares a bit easier for me. In fact my first attempt was to check each line, count to 10 or so and if there was a break already placed ahead, but sadly it requires a lot of tinkering for each song/language.