rakuri255 / UltraSinger

AI based tool to convert vocals lyrics and pitch from music to autogenerate Ultrastar Deluxe, Midi and notes. It automatic tapping, adding text, pitch vocals and creates karaoke files.
MIT License
290 stars 26 forks source link

IndexError: list index out of range #155

Open Patuu25 opened 5 months ago

Patuu25 commented 5 months ago

Hi im getting this error and i cant figure it out:

[UltraSinger] Creating C:\song\JENNIE - You & Me\JENNIE - You & Me.txt from transcription. [UltraSinger] Calculating silence parts for linebreaks. Traceback (most recent call last): File "C:\Ultrasinger\UltraSinger-main\src\UltraSinger.py", line 994, in main(sys.argv[1:]) File "C:\Ultrasinger\UltraSinger-main\src\UltraSinger.py", line 889, in main run() File "C:\Ultrasinger\UltraSinger-main\src\UltraSinger.py", line 436, in run real_bpm, ultrastar_file_output = create_ultrastar_txt_from_automation( File "C:\Ultrasinger\UltraSinger-main\src\UltraSinger.py", line 656, in create_ultrastar_txt_from_automation ultrastar_writer.create_ultrastar_txt_from_automation( File "C:\Ultrasinger\UltraSinger-main\src\modules\Ultrastar\ultrastar_writer.py", line 88, in create_ultrastar_txt_from_automation gap = transcribed_data[0].start IndexError: list index out of range

rakuri255 commented 4 months ago

Hi,

it seems that your transcribed data is empty. Its the stuff from Whisper.

Did other songs work? Or is it just this song?

Patuu25 commented 4 months ago

Hi

I tried a couple of songs. Im using the youtube function of giving it a link and all of them gave the same error.

rakuri255 commented 4 months ago

Can you post your hardware? CPU, RAM, GPU, VRAM ? Please add the full log.

Anktratten commented 1 month ago

I'd like to add that i am recieving the same issue, specifically when trying to use a Swedish language model from huggingface. I have tried several different models and i recieve the same issue, however it works just fine when i just use the standard model for english songs.

Hardware: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz 16gb RAM NVIDIA GeForce RTX 2060 4gb VRAM

Here is my log

`(.venv) C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src>py UltraSinger.py -i "https://www.youtube.com/watch?v=k9tHJPpew5s" --whisper "small" --whisper_align_model "ID2223/whisper-small-swedish" C:\Users\rasmu\AppData\Local\Programs\Python\Python310\lib\inspect.py:869: UserWarning: Module 'speechbrain.pretrained' was deprecated, redirecting to 'speechbrain.inference'. Please update your script. This is a change from SpeechBrain 1.0. See: https://github.com/speechbrain/speechbrain/releases/tag/v1.0.0 if ismodule(module) and hasattr(module, 'file'): C:\Users\rasmu\Downloads\UltraSinger-0.0.11.venv\lib\site-packages\pyannote\audio\pipelines\speaker_verification.py:45: UserWarning: Module 'speechbrain.pretrained' was deprecated, redirecting to 'speechbrain.inference'. Please update your script. This is a change from SpeechBrain 1.0. See: https://github.com/speechbrain/speechbrain/releases/tag/v1.0.0 from speechbrain.pretrained import (

[UltraSinger] [UltraSinger] UltraSinger Version: 0.0.11 [UltraSinger] [UltraSinger] Checking GPU support for tensorflow and pytorch. [UltraSinger] tensorflow - using cuda gpu. [UltraSinger] pytorch - using cuda gpu. [UltraSinger] full automatic mode [youtube] Extracting URL: https://www.youtube.com/watch?v=k9tHJPpew5s [youtube] k9tHJPpew5s: Downloading webpage [youtube] k9tHJPpew5s: Downloading ios player API JSON [youtube] k9tHJPpew5s: Downloading player 96d06116 WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = 9W9wEwjDq1y3LVSqoN ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = M867bc4Y2tOGCtWcki ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js [youtube] k9tHJPpew5s: Downloading m3u8 information [UltraSinger] Searching song in musicbrainz [UltraSinger] cant find title guld och gröna skogar in hasse andersson guld och gröna skogar official audio [UltraSinger] No match found [UltraSinger] Creating output folder. -> C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5) [UltraSinger] Downloading Audio [youtube] Extracting URL: https://www.youtube.com/watch?v=k9tHJPpew5s [youtube] k9tHJPpew5s: Downloading webpage [youtube] k9tHJPpew5s: Downloading ios player API JSON [youtube] k9tHJPpew5s: Downloading player 96d06116 WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = Zc8bJDZIJEZEzi5-Ha ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = 7ZGwfURn1dkQOpZPzo ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js [youtube] k9tHJPpew5s: Downloading m3u8 information [info] k9tHJPpew5s: Downloading 1 format(s): 140 [download] Destination: C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio [download] 100% of 2.84MiB in 00:00:00 at 21.53MiB/s [FixupM4a] Correcting container of "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio" [ExtractAudio] Destination: C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio.mp3 Deleting original file C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio (pass -k to keep) [UltraSinger] Downloading Video [youtube] Extracting URL: https://www.youtube.com/watch?v=k9tHJPpew5s [youtube] k9tHJPpew5s: Downloading webpage [youtube] k9tHJPpew5s: Downloading ios player API JSON [youtube] k9tHJPpew5s: Downloading player 96d06116 WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = ZPIk-YLNW3-b4IQns8 ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = WRHsSJCw_VwjN0-Pxe ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js [youtube] k9tHJPpew5s: Downloading m3u8 information [info] k9tHJPpew5s: Downloading 1 format(s): 616 [hlsnative] Downloading m3u8 manifest [hlsnative] Total fragments: 33 [download] Destination: C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio.mp4 [download] 100% of 14.81MiB in 00:00:01 at 9.87MiB/s [UltraSinger] Downloading thumbnail [youtube] Extracting URL: https://www.youtube.com/watch?v=k9tHJPpew5s [youtube] k9tHJPpew5s: Downloading webpage [youtube] k9tHJPpew5s: Downloading ios player API JSON [youtube] k9tHJPpew5s: Downloading player 96d06116 WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = CTI5mp6aN_WX4fcb3b ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js WARNING: [youtube] k9tHJPpew5s: nsig extraction failed: You may experience throttling for some formats n = P3T6uxnKgrHX8HZ3y8 ; player = https://www.youtube.com/s/player/96d06116/player_ias.vflset/en_US/base.js [youtube] k9tHJPpew5s: Downloading m3u8 information [UltraSinger] Creating output folder. -> C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\cache [UltraSinger] Separating vocals from audio with demucs and cuda as worker. Important: the default model was recently changed to htdemucs the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use -n mdx_extra_q. Selected model is a bag of 1 models. You will see that many progress bars per track. Separated tracks will be stored in C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\separated\htdemucs Separating track C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio.mp3 100%|████████████████████████████████████████████████████████████████████████| 187.2/187.2 [00:09<00:00, 19.44seconds/s] [UltraSinger] Converting wav to mp3 [UltraSinger] Reduce noise from vocal audio with ffmpeg. [UltraSinger] Converting audio for AI [UltraSinger] Mute audio parts with no singing [UltraSinger] Loading whisper with model small and cuda as worker [UltraSinger] using alignment model ID2223/whisper-small-swedish config.json: 100%|████████████████████████████████████████████████████████████████████████| 2.37k/2.37k [00:00<?, ?B/s] C:\Users\rasmu\Downloads\UltraSinger-0.0.11.venv\lib\site-packages\huggingface_hub\file_download.py:147: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\rasmu.cache\huggingface\hub\models--Systran--faster-whisper-small. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 2.20M/2.20M [00:00<00:00, 3.59MB/s] vocabulary.txt: 100%|███████████████████████████████████████████████████████████████| 460k/460k [00:00<00:00, 1.31MB/s] model.bin: 100%|████████████████████████████████████████████████████████████████████| 484M/484M [00:05<00:00, 93.3MB/s] No language specified, language will be first be detected for each audio file (increases inference time).:00, 1.31MB/s] Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.3.3. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\rasmu\.cache\torch\whisperx-vad-segmentation.bin Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.1+cu117. Bad things might happen unless you revert torch to 1.x. [UltraSinger] Transcribing C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\cache\HasseAnderssonGuldochgronaskogarOfficialAudio_mute.wav C:\Users\rasmu\Downloads\UltraSinger-0.0.11.venv\lib\site-packages\pyannote\audio\utils\reproducibility.py:74: ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy. It can be re-enabled by calling

import torch torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.allow_tf32 = True See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.

warnings.warn( Detected language: sv (0.85) in first 30s of audio... C:\Users\rasmu\Downloads\UltraSinger-0.0.11.venv\lib\site-packages\huggingface_hub\file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████| 339/339 [00:00<?, ?B/s] C:\Users\rasmu\Downloads\UltraSinger-0.0.11.venv\lib\site-packages\huggingface_hub\file_download.py:147: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\rasmu.cache\huggingface\hub\models--ID2223--whisper-small-swedish. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) tokenizer_config.json: 100%|█████████████████████████████████████████████████████████| 283k/283k [00:00<00:00, 839kB/s] vocab.json: 100%|█████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 2.25MB/s] merges.txt: 100%|███████████████████████████████████████████████████████████████████| 494k/494k [00:00<00:00, 1.44MB/s] normalizer.json: 100%|████████████████████████████████████████████████████████████| 52.7k/52.7k [00:00<00:00, 52.7MB/s] added_tokens.json: 100%|██████████████████████████████████████████████████████████| 34.6k/34.6k [00:00<00:00, 30.9MB/s] special_tokens_map.json: 100%|████████████████████████████████████████████████████████████| 2.19k/2.19k [00:00<?, ?B/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.32k/1.32k [00:00<?, ?B/s] You are using a model of type whisper to instantiate a model of type wav2vec2. This is not supported for all configurations of models and can yield errors. model.safetensors: 100%|████████████████████████████████████████████████████████████| 967M/967M [00:28<00:00, 33.8MB/s] Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at ID2223/whisper-small-swedish and are newly initialized: ['encoder.layer_norm.bias', 'encoder.layer_norm.weight', 'encoder.layers.0.attention.k_proj.bias', 'encoder.layers.0.attention.k_proj.weight', 'encoder.layers.0.attention.out_proj.bias', 'encoder.layers.0.attention.out_proj.weight', 'encoder.layers.0.attention.q_proj.bias', 'encoder.layers.0.attention.q_proj.weight', 'encoder.layers.0.attention.v_proj.bias', 'encoder.layers.0.attention.v_proj.weight', 'encoder.layers.0.feed_forward.intermediate_dense.bias', 'encoder.layers.0.feed_forward.intermediate_dense.weight', 'encoder.layers.0.feed_forward.output_dense.bias', 'encoder.layers.0.feed_forward.output_dense.weight', 'encoder.layers.0.final_layer_norm.bias', 'encoder.layers.0.final_layer_norm.weight', 'encoder.layers.0.layer_norm.bias', 'encoder.layers.0.layer_norm.weight', 'encoder.layers.1.attention.k_proj.bias', 'encoder.layers.1.attention.k_proj.weight', 'encoder.layers.1.attention.out_proj.bias', 'encoder.layers.1.attention.out_proj.weight', 'encoder.layers.1.attention.q_proj.bias', 'encoder.layers.1.attention.q_proj.weight', 'encoder.layers.1.attention.v_proj.bias', 'encoder.layers.1.attention.v_proj.weight', 'encoder.layers.1.feed_forward.intermediate_dense.bias', 'encoder.layers.1.feed_forward.intermediate_dense.weight', 'encoder.layers.1.feed_forward.output_dense.bias', 'encoder.layers.1.feed_forward.output_dense.weight', 'encoder.layers.1.final_layer_norm.bias', 'encoder.layers.1.final_layer_norm.weight', 'encoder.layers.1.layer_norm.bias', 'encoder.layers.1.layer_norm.weight', 'encoder.layers.10.attention.k_proj.bias', 'encoder.layers.10.attention.k_proj.weight', 'encoder.layers.10.attention.out_proj.bias', 'encoder.layers.10.attention.out_proj.weight', 'encoder.layers.10.attention.q_proj.bias', 'encoder.layers.10.attention.q_proj.weight', 'encoder.layers.10.attention.v_proj.bias', 'encoder.layers.10.attention.v_proj.weight', 'encoder.layers.10.feed_forward.intermediate_dense.bias', 'encoder.layers.10.feed_forward.intermediate_dense.weight', 'encoder.layers.10.feed_forward.output_dense.bias', 'encoder.layers.10.feed_forward.output_dense.weight', 'encoder.layers.10.final_layer_norm.bias', 'encoder.layers.10.final_layer_norm.weight', 'encoder.layers.10.layer_norm.bias', 'encoder.layers.10.layer_norm.weight', 'encoder.layers.11.attention.k_proj.bias', 'encoder.layers.11.attention.k_proj.weight', 'encoder.layers.11.attention.out_proj.bias', 'encoder.layers.11.attention.out_proj.weight', 'encoder.layers.11.attention.q_proj.bias', 'encoder.layers.11.attention.q_proj.weight', 'encoder.layers.11.attention.v_proj.bias', 'encoder.layers.11.attention.v_proj.weight', 'encoder.layers.11.feed_forward.intermediate_dense.bias', 'encoder.layers.11.feed_forward.intermediate_dense.weight', 'encoder.layers.11.feed_forward.output_dense.bias', 'encoder.layers.11.feed_forward.output_dense.weight', 'encoder.layers.11.final_layer_norm.bias', 'encoder.layers.11.final_layer_norm.weight', 'encoder.layers.11.layer_norm.bias', 'encoder.layers.11.layer_norm.weight', 'encoder.layers.2.attention.k_proj.bias', 'encoder.layers.2.attention.k_proj.weight', 'encoder.layers.2.attention.out_proj.bias', 'encoder.layers.2.attention.out_proj.weight', 'encoder.layers.2.attention.q_proj.bias', 'encoder.layers.2.attention.q_proj.weight', 'encoder.layers.2.attention.v_proj.bias', 'encoder.layers.2.attention.v_proj.weight', 'encoder.layers.2.feed_forward.intermediate_dense.bias', 'encoder.layers.2.feed_forward.intermediate_dense.weight', 'encoder.layers.2.feed_forward.output_dense.bias', 'encoder.layers.2.feed_forward.output_dense.weight', 'encoder.layers.2.final_layer_norm.bias', 'encoder.layers.2.final_layer_norm.weight', 'encoder.layers.2.layer_norm.bias', 'encoder.layers.2.layer_norm.weight', 'encoder.layers.3.attention.k_proj.bias', 'encoder.layers.3.attention.k_proj.weight', 'encoder.layers.3.attention.out_proj.bias', 'encoder.layers.3.attention.out_proj.weight', 'encoder.layers.3.attention.q_proj.bias', 'encoder.layers.3.attention.q_proj.weight', 'encoder.layers.3.attention.v_proj.bias', 'encoder.layers.3.attention.v_proj.weight', 'encoder.layers.3.feed_forward.intermediate_dense.bias', 'encoder.layers.3.feed_forward.intermediate_dense.weight', 'encoder.layers.3.feed_forward.output_dense.bias', 'encoder.layers.3.feed_forward.output_dense.weight', 'encoder.layers.3.final_layer_norm.bias', 'encoder.layers.3.final_layer_norm.weight', 'encoder.layers.3.layer_norm.bias', 'encoder.layers.3.layer_norm.weight', 'encoder.layers.4.attention.k_proj.bias', 'encoder.layers.4.attention.k_proj.weight', 'encoder.layers.4.attention.out_proj.bias', 'encoder.layers.4.attention.out_proj.weight', 'encoder.layers.4.attention.q_proj.bias', 'encoder.layers.4.attention.q_proj.weight', 'encoder.layers.4.attention.v_proj.bias', 'encoder.layers.4.attention.v_proj.weight', 'encoder.layers.4.feed_forward.intermediate_dense.bias', 'encoder.layers.4.feed_forward.intermediate_dense.weight', 'encoder.layers.4.feed_forward.output_dense.bias', 'encoder.layers.4.feed_forward.output_dense.weight', 'encoder.layers.4.final_layer_norm.bias', 'encoder.layers.4.final_layer_norm.weight', 'encoder.layers.4.layer_norm.bias', 'encoder.layers.4.layer_norm.weight', 'encoder.layers.5.attention.k_proj.bias', 'encoder.layers.5.attention.k_proj.weight', 'encoder.layers.5.attention.out_proj.bias', 'encoder.layers.5.attention.out_proj.weight', 'encoder.layers.5.attention.q_proj.bias', 'encoder.layers.5.attention.q_proj.weight', 'encoder.layers.5.attention.v_proj.bias', 'encoder.layers.5.attention.v_proj.weight', 'encoder.layers.5.feed_forward.intermediate_dense.bias', 'encoder.layers.5.feed_forward.intermediate_dense.weight', 'encoder.layers.5.feed_forward.output_dense.bias', 'encoder.layers.5.feed_forward.output_dense.weight', 'encoder.layers.5.final_layer_norm.bias', 'encoder.layers.5.final_layer_norm.weight', 'encoder.layers.5.layer_norm.bias', 'encoder.layers.5.layer_norm.weight', 'encoder.layers.6.attention.k_proj.bias', 'encoder.layers.6.attention.k_proj.weight', 'encoder.layers.6.attention.out_proj.bias', 'encoder.layers.6.attention.out_proj.weight', 'encoder.layers.6.attention.q_proj.bias', 'encoder.layers.6.attention.q_proj.weight', 'encoder.layers.6.attention.v_proj.bias', 'encoder.layers.6.attention.v_proj.weight', 'encoder.layers.6.feed_forward.intermediate_dense.bias', 'encoder.layers.6.feed_forward.intermediate_dense.weight', 'encoder.layers.6.feed_forward.output_dense.bias', 'encoder.layers.6.feed_forward.output_dense.weight', 'encoder.layers.6.final_layer_norm.bias', 'encoder.layers.6.final_layer_norm.weight', 'encoder.layers.6.layer_norm.bias', 'encoder.layers.6.layer_norm.weight', 'encoder.layers.7.attention.k_proj.bias', 'encoder.layers.7.attention.k_proj.weight', 'encoder.layers.7.attention.out_proj.bias', 'encoder.layers.7.attention.out_proj.weight', 'encoder.layers.7.attention.q_proj.bias', 'encoder.layers.7.attention.q_proj.weight', 'encoder.layers.7.attention.v_proj.bias', 'encoder.layers.7.attention.v_proj.weight', 'encoder.layers.7.feed_forward.intermediate_dense.bias', 'encoder.layers.7.feed_forward.intermediate_dense.weight', 'encoder.layers.7.feed_forward.output_dense.bias', 'encoder.layers.7.feed_forward.output_dense.weight', 'encoder.layers.7.final_layer_norm.bias', 'encoder.layers.7.final_layer_norm.weight', 'encoder.layers.7.layer_norm.bias', 'encoder.layers.7.layer_norm.weight', 'encoder.layers.8.attention.k_proj.bias', 'encoder.layers.8.attention.k_proj.weight', 'encoder.layers.8.attention.out_proj.bias', 'encoder.layers.8.attention.out_proj.weight', 'encoder.layers.8.attention.q_proj.bias', 'encoder.layers.8.attention.q_proj.weight', 'encoder.layers.8.attention.v_proj.bias', 'encoder.layers.8.attention.v_proj.weight', 'encoder.layers.8.feed_forward.intermediate_dense.bias', 'encoder.layers.8.feed_forward.intermediate_dense.weight', 'encoder.layers.8.feed_forward.output_dense.bias', 'encoder.layers.8.feed_forward.output_dense.weight', 'encoder.layers.8.final_layer_norm.bias', 'encoder.layers.8.final_layer_norm.weight', 'encoder.layers.8.layer_norm.bias', 'encoder.layers.8.layer_norm.weight', 'encoder.layers.9.attention.k_proj.bias', 'encoder.layers.9.attention.k_proj.weight', 'encoder.layers.9.attention.out_proj.bias', 'encoder.layers.9.attention.out_proj.weight', 'encoder.layers.9.attention.q_proj.bias', 'encoder.layers.9.attention.q_proj.weight', 'encoder.layers.9.attention.v_proj.bias', 'encoder.layers.9.attention.v_proj.weight', 'encoder.layers.9.feed_forward.intermediate_dense.bias', 'encoder.layers.9.feed_forward.intermediate_dense.weight', 'encoder.layers.9.feed_forward.output_dense.bias', 'encoder.layers.9.feed_forward.output_dense.weight', 'encoder.layers.9.final_layer_norm.bias', 'encoder.layers.9.final_layer_norm.weight', 'encoder.layers.9.layer_norm.bias', 'encoder.layers.9.layer_norm.weight', 'encoder.pos_conv_embed.conv.bias', 'encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v', 'feature_extractor.conv_layers.0.conv.weight', 'feature_extractor.conv_layers.0.layer_norm.bias', 'feature_extractor.conv_layers.0.layer_norm.weight', 'feature_extractor.conv_layers.1.conv.weight', 'feature_extractor.conv_layers.2.conv.weight', 'feature_extractor.conv_layers.3.conv.weight', 'feature_extractor.conv_layers.4.conv.weight', 'feature_extractor.conv_layers.5.conv.weight', 'feature_extractor.conv_layers.6.conv.weight', 'feature_projection.layer_norm.bias', 'feature_projection.layer_norm.weight', 'feature_projection.projection.bias', 'feature_projection.projection.weight', 'lm_head.bias', 'lm_head.weight', 'masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Failed to align segment (" Det var dans och hålligång, upp på logen att en lång, det var sommar, det var guld och gröldna skugar. Det är en stilla kväll, men jag får ingen ruv. Solen dröjer kvar vid hovets rann."): backtrack failed, resorting to original... Failed to align segment (" Jag börjar tänka på en gång för länge sen. En annan kväll i Pilevallars land."): backtrack failed, resorting to original... Failed to align segment (" Spelar männen, spelar det som älven vid urlös. Hågorna från röbo var bilda och dom svingade var sin tös. Det var damms och hålligång och på luggen nackten lång. Det var sommar, väva guld och gröna skuga. Jag var hund och du var grann som vi älskade varann. Det var sommar, väva guld och gröna skuga."): backtrack failed, resorting to original... Failed to align segment (" Under äkendervion såg du mig om jag var din i alla våra dag."): backtrack failed, resorting to original... Failed to align segment (" Och solen sänkte sig när fjolen stämde upp. Jag lovade att det alltid stannar kvar. Ja, det var dan så hålligång, upphållig ugen att den lång. Det var sommarväver, guld och gröna skugar. Jag var ung och du var grann, som vi älskade varann. Och jag lovade dig, guld och gröna skugar. Yeah!"): backtrack failed, resorting to original... Failed to align segment (" Ja."): backtrack failed, resorting to original... Failed to align segment (" Spelemännen spelade som elden vore lös. Peugorna från skölbo var vilda och dom smingade var sin tös. Det var dan så hålligång, upp hålligogen att den lång. Det var sommar, det var guld och gröna skugar. Jag var ung och du var grann, som vi älskade varann. Och i alla ovar det där i guld och gröna skugar."): backtrack failed, resorting to original... Failed to align segment (" Jag var ung och du var grann Som vi älskade varann Det var sommer, det var guld och grön och skuga Så jag är mig sommer, jag är mig guld och grön och skuga"): backtrack failed, resorting to original... [UltraSinger] Hyphenate using language code: sv_FI 0it [00:00, ?it/s] [UltraSinger] Removing silent parts from transcription data [UltraSinger] Pitching with crepe and model full and cuda as worker 2024-10-05 23:31:30.083735: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-05 23:31:30.376476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1340 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5 2024-10-05 23:31:36.046610: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500 2024-10-05 23:31:36.537257: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.56GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:36.537609: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.56GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:36.558795: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.56GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:36.558971: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.56GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 573/576 [============================>.] - ETA: 0s2024-10-05 23:31:46.366613: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.46GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:46.366807: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.46GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:46.382316: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.46GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-10-05 23:31:46.382504: W tensorflow/core/common_runtime/bfc_allocator.cc:290] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.46GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 576/576 [==============================] - 11s 17ms/step [UltraSinger] Creating midi notes from pitched data [UltraSinger] Creating Ultrastar notes from midi data [UltraSinger] BPM is 103.36 [UltraSinger] Creating C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\output\HasseAnderssonGuldochgronaskogarOfficialAudio (5)\HasseAnderssonGuldochgronaskogarOfficialAudio.txt from transcription. [UltraSinger] Calculating silence parts for linebreaks. Traceback (most recent call last): File "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\UltraSinger.py", line 988, in main(sys.argv[1:]) File "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\UltraSinger.py", line 880, in main run() File "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\UltraSinger.py", line 422, in run real_bpm, ultrastar_file_output = create_ultrastar_txt_from_automation( File "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\UltraSinger.py", line 645, in create_ultrastar_txt_from_automation ultrastar_writer.create_ultrastar_txt_from_automation( File "C:\Users\rasmu\Downloads\UltraSinger-0.0.11\src\modules\Ultrastar\ultrastar_writer.py", line 88, in create_ultrastar_txt_from_automation gap = transcribed_data[0].start IndexError: list index out of range`