haoheliu / versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.
MIT License
1.07k stars 106 forks source link

Does not work #17

Open KindSpidey opened 11 months ago

KindSpidey commented 11 months ago

Hey, everyone! Can you tell me, please, how to make this package work?

I follow the intruction:

Then trying to apply library to my audiofile:

D:\SONGer> audiosr -i track.mp3

and got error C:\Users\my_name\AppData\Local\Programs\Python\Python311\python.exe: No module named audiosr.__main__; 'audiosr' is a package and cannot be directly executed

Well, then I tried to read py files of project and use it like library in code by doing this:

import audiosr
from audiosr.latent_diffusion.models.ddpm import DDPM, LatentDiffusion

input_file = "track.mp3"

audiosr.super_resolution(input_file=input_file, latent_diffusion=LatentDiffusion)

but this got me to error TypeError: LatentDiffusion.generate_batch() missing 1 required positional argument: 'batch' if do this -audiosr.super_resolution(input_file=input_file, latent_diffusion=LatentDiffusion(DDPM)) than this

assert self.num_timesteps_cond <= kwargs["timesteps"]
                                      ~~~~~~^^^^^^^^^^^^^
KeyError: 'timesteps'

How to make it work locally?

falseywinchnet commented 11 months ago

send boobs

import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto")

waveform = super_resolution( audiosr, "C:\pathtofile\./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 )

save_wave(waveform, "C:\pathtofile\./", name="test", samplerate=48000) #note: do not put .wav at end of file name, it will do this automatically and grumble if you do it

ThomasFan1945 commented 11 months ago

send boobs

import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto")

waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 )

save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, it will do this automatically and grumble if you do it

Hey, my dude. Can you please show us in video form? I have a better understanding by watching videos, thanks.

falseywinchnet commented 11 months ago

you can just use your nominal sampling rate and it will work fine as long as you highpass your data

On Thu, Sep 21, 2023 at 4:12 PM ThomasFan1945 @.***> wrote:

send boobs

import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto")

waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 )

save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000)

note: do not put .wav at end of file name, it will do this automatically

and grumble if you do it

Hey, my dude. Can you please show us in video form? I have a better understanding by watching videos, thanks.

— Reply to this email directly, view it on GitHub https://github.com/haoheliu/versatile_audio_super_resolution/issues/17#issuecomment-1730308893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUYJQQ5CWJK5OASVGIJ3P6DX3SUS5ANCNFSM6AAAAAA5BED62M . You are receiving this because you commented.Message ID: @.*** com>

ThomasFan1945 commented 11 months ago

you can just use your nominal sampling rate and it will work fine as long as you highpass your data On Thu, Sep 21, 2023 at 4:12 PM ThomasFan1945 @.> wrote: send boobs import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto") waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 ) save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, it will do this automatically and grumble if you do it Hey, my dude. Can you please show us in video form? I have a better understanding by watching videos, thanks. — Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUYJQQ5CWJK5OASVGIJ3P6DX3SUS5ANCNFSM6AAAAAA5BED62M . You are receiving this because you commented.Message ID: @. com>

This is confusing. Please do a video tutorial.

KindSpidey commented 11 months ago

send boobs

import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto")

waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 )

save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, it will do this automatically and grumble if you do it

your code starts download 6GB model speech. I guees, it is not cool

falseywinchnet commented 11 months ago

you can just use your nominal sampling rate and it will work fine as long as you highpass your data On Thu, Sep 21, 2023 at 4:12 PM ThomasFan1945 @._> wrote: send boobs import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high") audiosr = build_model(model_name="speech", device="auto") waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", #use a sampling rate less than 42! recommend 6k mono. use a file less than 10s long!. seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 ) save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, it will do this automatically and grumble if you do it Hey, my dude. Can you please show us in video form? I have a better understanding by watching videos, thanks. — Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUYJQQ5CWJK5OASVGIJ3P6DX3SUS5ANCNFSM6AAAAAA5BED62M . You are receiving this because you commented.Message ID: _@_._ com>

This is confusing. Please do a video tutorial.

pay me?

madwurmz commented 11 months ago

@falseywinchnet I will literately pay somebody to get this working! 🗣️ It feels so promising and highly valuable , but I can't get this to work, I tried a lot of things many different errors!

falseywinchnet commented 11 months ago

i would be glad to take your money here is my recommendation: use miniforge python, not other release(not anaconda, etc)

pip install audiosr

in python IDE or ipy notebook: import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high")

audiosr = build_model(model_name="speech", device="auto") #this task will download a 6gb file. be sure you have the hard drive space, and enough RAM.

if you are using cuda, you will need at least 8gb VRAM!

waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 ) #note: just use your ordinary audio and low-pass filter it before applying the network. just like inpainting a picture!

save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, audiosr does this automatically, does not like it if you do this

madwurmz commented 11 months ago

@falseywinchnet Thanks! That sounds helpful! Do you mean I need to do things like import os necessarily because the bare repo is handling that insufficiently?

We're in the wrong channel to chat but initially I had it kind of installed with anaconda. I had to search where it was installed , not in my preferred directory, this repo installs files then hides them deep in the windows 10 folders, I found it in some Scripts folder.. a few attempts gave errors related to permissions and filenames, then I tried to convert a batch, it downloaded the 6gb model and it filled 10gb vram and then it froze. I have a lot of samples to convert , curious to the quality . I also could imagine this tool working inside bark and other ai generators 🎵

falseywinchnet commented 11 months ago

your not paying me enough

On Sat, Sep 23, 2023 at 10:36 PM madwurmz @.***> wrote:

@falseywinchnet https://github.com/falseywinchnet Thanks! That sounds helpful! Do you mean I need to do things like import os necessarily because the bare repo is handling that insufficiently?

We're in the wrong channel to chat but initially I had it kind of installed with anaconda. I had to search where it was installed , not in my preferred directory, this repo installs files then hides them deep in the windows 10 folders, I found it in some Scripts folder.. a few attempts gave errors related to permissions and filenames, then I tried to convert a batch, it downloaded the 6gb model and it filled 10gb vram and then it froze. I have a lot of samples to convert , curious to the quality . I also could imagine this tool working inside bark and other ai generators 🎵

— Reply to this email directly, view it on GitHub https://github.com/haoheliu/versatile_audio_super_resolution/issues/17#issuecomment-1732467502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUYJQQ275JWCYJI6X2GEPBLX36TEXANCNFSM6AAAAAA5BED62M . You are receiving this because you were mentioned.Message ID: @.*** com>

madwurmz commented 11 months ago

@falseywinchnet ohlala 🛩️ I have 1000's of samples, if this tool is working, it is really worth something! I can give anybody who can make this working, a ko-fi, or I can pay you with a compliment, pay respect or pay with a personalized rap song 💯

falseywinchnet commented 11 months ago

all you have to do is learn a little bit of python

On Sun, Sep 24, 2023 at 8:53 AM madwurmz @.***> wrote:

@falseywinchnet https://github.com/falseywinchnet ohlala 🛩️ I have 1000's of samples, if this tool is working, it is really worth something! I can give anybody who can this working, a ko-fi, or I can pay you with a compliment, pay respect or pay with a personalized rap song 💯

— Reply to this email directly, view it on GitHub https://github.com/haoheliu/versatile_audio_super_resolution/issues/17#issuecomment-1732575362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUYJQQ53Y6UKWJA4QTTTDXLX4A3NXANCNFSM6AAAAAA5BED62M . You are receiving this because you were mentioned.Message ID: @.*** com>

madwurmz commented 11 months ago

@falseywinchnet the command you given is wrong , no way a dot is needed in the directory?? and it says you mix windows with unix style formating when you suggest this: C:\pathtofile./filename.wav

can you suggest how to rewrite it to just ran any batch?

Im at the part it says import os, so I'm very serious! but I have anaconda and python 3.10 and that always working fine, are you sponsored by miniforge?

I will pay real money to get this is working! dont worry about that! 🥇 or I write you a dis rap song if you keep on being so sarcastic! 🍡

edit: reading other reports saying this needs 16 gb vram ?! thats why it wont even work for me sitting on 11gb... 😞

i would be glad to take your money here is my recommendation: use miniforge python, not other release(not anaconda, etc)

pip install audiosr

in python IDE or ipy notebook: import os import torch import logging from audiosr import super_resolution, build_model, save_wave, get_time, read_list import argparse

os.environ["TOKENIZERS_PARALLELISM"] = "true" matplotlib_logger = logging.getLogger('matplotlib') matplotlib_logger.setLevel(logging.WARNING) torch.set_float32_matmul_precision("high")

audiosr = build_model(model_name="speech", device="auto") #this task will download a 6gb file. be sure you have the hard drive space, and enough RAM. #if you are using cuda, you will need at least 8gb VRAM!

waveform = super_resolution( audiosr, "C:\pathtofile./filename.wav", seed=42, guidance_scale=3.5, ddim_steps=50, latent_t_per_second=12.8 ) #note: just use your ordinary audio and low-pass filter it before applying the network. just like inpainting a picture!

save_wave(waveform, "C:\pathtofile./", name="test", samplerate=48000) #note: do not put .wav at end of file name, audiosr does this automatically, does not like it if you do this