Closed danielkorg closed 4 years ago
Hi! You are correct, the model works with the 32kHz sampling rate. If you need your output to be in 44.1 kHz, you have to resample it -- please see the evaluate.py script where we do exactly that (lines 59
- 64
). Let me know if that solves your problem :relaxed:
Fix it for all Of us please!
The evaluate.py script seems to work with stereo files, that is fine, but because of that, then it introduces different indexing and transposes from the other script used in the Colab demo. Can you just show us how would you use that stereo evaluate.py script to separate an input 44.1kHz file and output 4 stems of same length as input and 44.1kHz, using your own demo script? It's easier to match your method, then for us to go back and forth on what is going on. Thank you!
Hi, I'm sorry it took me so long... Here's a sample code that separates stereo and resamples it back to the original sampling rate :) https://gist.github.com/davda54/aa555c011866392c32c4906f8a709682
Everything works perfectly now! Thank you very much! :)
I can't separate files, pls tutorial
If input is 44,100 kHz, the separated outputs are 32,000kHz, it should be equal to the original input 44,100kHz. But the problem is not just the sampling itself, the output stems are shorter than the input mixture by the same ratio between their sampling frequencies. in other words ratio = 44100/32000 is the same ratio in terms of stem length in seconds as mixtureLengthInSeconds/anyOutputStemLengthInSeconds, if you understand what I mean. It should not truncate it like that, it should output the same length and sum to the original mixture. this is the script I used to test, it is from your Colab demo:
import torch from model.tasnet import MultiTasNet import soundfile import librosa import numpy as np
state = torch.load("best_model.pt") # load checkpoint
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # optionally use the GPU
device = torch.device("cpu") # use only cpu, if gpu gives problems network = MultiTasNet(state["args"]).to(device) # initialize the model network.load_state_dict(state['state_dict']) # load weights from the checkpoint
def separate_sample(audio, rate: int):
audio, rate = soundfile.read("test.wav") audio = librosa.core.to_mono(audio.transpose()) print(audio.shape, rate)
audio = np.expand_dims(audio, 0)
print() print("separating... ", end='') estimates = separate_sample(audio, rate) print("done")
print("saving audio files to folder...")
print(estimates.shape)
drums = estimates["drums"] print(drums.shape) bass = estimates["bass"] print(bass.shape) other = estimates["other"] print(other.shape) vocals = estimates["vocals"] print(vocals.shape)
soundfile.write("test_drums.wav", drums, rate) soundfile.write("test_bass.wav", bass, rate) soundfile.write("test_other.wav", other, rate) soundfile.write("test_vocals.wav", vocals, rate)