When using the dance model for inference, I found that I couldn't use my own wav audio files because I couldn't obtain the script that processes the wav file into a pkl file. I know that the pkl file stores audio features, especially the three features used in the dance model inference: chroma, spectral_flux, and beat_activations. I tried using the following script:
"""
from madmom.audio.signal import Signal
from madmom.audio.spectrogram import Spectrogram
from madmom.features.onsets import spectral_flux
import librosa
import numpy as np
import pickle as pkl
from madmom.features.downbeats import RNNDownBeatProcessor
wav_path='kthstreet_gLO_sFM_cAll_d02_mLO_ch01_arethafranklinrocksteady_002.wav'
pkl_path='kthstreet_gLO_sFM_cAll_d02_mLO_ch01_arethafranklinrocksteady_002_00.audio29_30fps.pkl'
with open(pkl_path, 'rb') as f:
ctrl = pkl.load(f)
signal = Signal(wav_path, sample_rate=48000)
y, sr = librosa.load(wav_path, sr=None)
print(ctrl)
print(chroma.shape, chroma)
print(spec_flux0.shape, spec_flux0)
print(beatactivations.shape, beatactivations)
"""
The wav file and pkl file are from the folder: \ListenDenoiseAction\data\motorica_dance.
I tried to extract the chroma, spectral_flux, and beat_activations features, but I found that they are inconsistent with the features provided in the pkl file because I couldn’t get the exact processing methods and parameters. I'm wondering if you could provide the script to process a wav file into a pkl file.
Hey!
When using the dance model for inference, I found that I couldn't use my own wav audio files because I couldn't obtain the script that processes the wav file into a pkl file. I know that the pkl file stores audio features, especially the three features used in the dance model inference: chroma, spectral_flux, and beat_activations. I tried using the following script: """ from madmom.audio.signal import Signal from madmom.audio.spectrogram import Spectrogram from madmom.features.onsets import spectral_flux import librosa import numpy as np import pickle as pkl from madmom.features.downbeats import RNNDownBeatProcessor
wav_path='kthstreet_gLO_sFM_cAll_d02_mLO_ch01_arethafranklinrocksteady_002.wav' pkl_path='kthstreet_gLO_sFM_cAll_d02_mLO_ch01_arethafranklinrocksteady_002_00.audio29_30fps.pkl' with open(pkl_path, 'rb') as f: ctrl = pkl.load(f)
signal = Signal(wav_path, sample_rate=48000) y, sr = librosa.load(wav_path, sr=None)
chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_fft=2048, hop_length=1600, n_chroma=5)
spectrogram0 = Spectrogram(signal, frame_size=2048, hop_size=1600, fmin=0.0, fmax=8000.0, num_bins=27, log=True) spec_flux0 = spectral_flux(spectrogram0)
proc = RNNDownBeatProcessor(fps=30) beatactivations = proc(wav_path, sr=sr)
print(ctrl) print(chroma.shape, chroma) print(spec_flux0.shape, spec_flux0) print(beatactivations.shape, beatactivations) """ The wav file and pkl file are from the folder: \ListenDenoiseAction\data\motorica_dance. I tried to extract the chroma, spectral_flux, and beat_activations features, but I found that they are inconsistent with the features provided in the pkl file because I couldn’t get the exact processing methods and parameters. I'm wondering if you could provide the script to process a wav file into a pkl file.
Thanks!