Open asigalov61 opened 1 year ago
Now it should be fixed in the up to date version
@haoheliu Thanks. It works on some files now but I still get an error on the following file:
Loading AudioSR: basic
Loading model on cuda:0
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/torchaudio/transforms/_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
DiffusionWrapper has 258.20 M params.
/usr/local/lib/python3.10/dist-packages/audiosr/latent_diffusion/models/ddpm.py:237: RuntimeWarning: divide by zero encountered in divide
"sqrt_recip_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod))
/usr/local/lib/python3.10/dist-packages/audiosr/latent_diffusion/models/ddpm.py:240: RuntimeWarning: divide by zero encountered in divide
"sqrt_recipm1_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod - 1))
Warning: audio is longer than 10.24 seconds, may degrade the model performance. It's recommand to truncate your audio to 5.12 seconds before input to AudioSR to get the best performance.
Traceback (most recent call last):
File "/usr/local/bin/audiosr", line 107, in <module>
waveform = super_resolution(
File "/usr/local/lib/python3.10/dist-packages/audiosr/pipeline.py", line 164, in super_resolution
batch, duration = make_batch_for_super_resolution(input_file, waveform=waveform)
File "/usr/local/lib/python3.10/dist-packages/audiosr/pipeline.py", line 83, in make_batch_for_super_resolution
log_mel_spec, stft, waveform, duration, target_frame = read_audio_file(input_file)
File "/usr/local/lib/python3.10/dist-packages/audiosr/utils.py", line 208, in read_audio_file
waveform, target_frame, duration = read_wav_file(filename)
File "/usr/local/lib/python3.10/dist-packages/audiosr/utils.py", line 204, in read_wav_file
waveform = pad_wav(waveform, target_length=int(48000 * pad_duration))
File "/usr/local/lib/python3.10/dist-packages/audiosr/utils.py", line 63, in pad_wav
temp_wav[:, rand_start : rand_start + waveform_length] = waveform
ValueError: could not broadcast input array from shape (1,7127040) into shape (1,7127039)
@haoheliu @RetroCirce Any update on this bug? I tried it today it still does not work. Thank you.
Sorry for the error. Will look into that later this week. @asigalov61
@haoheliu Thanks. Please let me know if you will be able to fix it and I will try it again.
I got this error today trying to upscale my 300 sec audiocraft wavs... so I asked Gemini if it could help... this is what it gave me. Absolutely! I can help you code a splitter/stitcher for your long audiocraft fork files. Here's a Python script that you can use:
import os
from pydub import AudioSegment
# Set these variables
INPUT_DIR = "path/to/your/input/folder" # Folder containing your long audio files
OUTPUT_DIR = "path/to/your/output/folder" # Folder to save split files
MAX_DURATION_SECONDS = 10 # Maximum duration per split file (adjust as needed)
# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)
for filename in os.listdir(INPUT_DIR):
# Get full path and check if it's an audio file
filepath = os.path.join(INPUT_DIR, filename)
if not filename.endswith((".wav", ".mp3", ".flac")):
continue
# Load audio segment
audio = AudioSegment.from_file(filepath)
# Calculate number of splits needed
num_splits = int(audio.duration_seconds / MAX_DURATION_SECONDS) + 1
# Split and save audio segments
for i in range(num_splits):
start_time = i * MAX_DURATION_SECONDS * 1000
end_time = min((i + 1) * MAX_DURATION_SECONDS * 1000, audio.duration_seconds * 1000)
segment = audio[start_time:end_time]
output_filename = os.path.join(OUTPUT_DIR, f"{filename.split('.')[0]}_{i}.{filename.split('.')[-1]}")
segment.export(output_filename, format=filename.split('.')[-1])
print(f"Split and saved {num_splits} segments from {filename}!")
Explanation:
pydub
is used for audio manipulation.INPUT_DIR
, OUTPUT_DIR
, and MAX_DURATION_SECONDS
as needed.MAX_DURATION_SECONDS
.Important Notes:
pydub
, so install it using pip install pydub
.MAX_DURATION_SECONDS
based on your desired split size.Remember to install pydub
and adjust the variables before running the script. Let me know if you have any questions or need further assistance!
@WyrmSpear Thanks, I will try it out.
@RetroCirce @haoheliu
Hello, guys !!! :)
Thank you for publishing this work. It looks very promising and the samples are very good too.
I need your audiosr for my music WAVs but it does not work in Google Colab.
Please see attached WAV that produces the following traceback on A100 40GB:
Very Nauty Violin.zip