librosa / librosa

Python library for audio and music analysis
https://librosa.org/
ISC License
7.13k stars 964 forks source link

beat_track donot works on amplified audio but works after save and load #1811

Closed meltyMap closed 8 months ago

meltyMap commented 8 months ago

Describe the bug 1.I have some quiet audio files with rhythmic content. 2.I try to run beat_track on them, but it returns 0. When I amplify the files in an audio editor and run beat_track again, it successfully returns beat information. 3.I try to amplify the audio directly with librosa and run beat_track on the amplified numpy.ndarray, but it still returns 0. 4.However, when I save the amplified numpy.ndarray to a file, load it back in, and run beat_track, it then successfully returns beats again.This confuses me, I don't know if I'm making a mistake or something is wrong.

To Reproduce audio file:2.zip

Example:

import librosa
import sys
import numpy as np
import soundfile as sf

now_path = sys.argv[1]

def process_snd(path):
    print(f"Processing:{path}")
    # Load audio file
    audio, sr = librosa.load(path)

    # Get tempo (beats per minute)
    tempo, beats = librosa.beat.beat_track(y=audio, sr=sr, start_bpm=70)
    print(f"tempo is:{tempo},beats:{beats}")

    if tempo == 0 or len(beats)<2:
        #amped audio and try again
        rms = np.mean(librosa.feature.rms(y=audio)**2)
        audio_amped = ((0.1/rms)**0.5)*audio
        tempo1, beats1 = librosa.beat.beat_track(y=audio_amped, sr=sr, start_bpm=70)
        print('after amped:')
        print(tempo1,beats1)
        #return 0

        # save and reload
        sf.write(path+'.amped.wav', audio_amped, sr)
        audio2, sr2 = librosa.load(path+'.amped.wav')
        tempo2, beats2 = librosa.beat.beat_track(y=audio2, sr=sr2, start_bpm=70)
        print('after reload:')
        print(sr2,tempo2,beats2)
        #return right tempo

process_snd(now_path)

Expected behavior beat_track donot works on amplified audio but works after save and load.

Screenshots image

Software versions

Windows-10-10.0.22631-SP0
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
NumPy 1.24.3
SciPy 1.12.0
librosa 0.10.1
INSTALLED VERSIONS
------------------
python: 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]

librosa: 0.10.1

audioread: 3.0.1
numpy: 1.24.3
scipy: 1.12.0
sklearn: 1.4.1.post1
joblib: 1.3.2
decorator: 5.1.1
numba: 0.59.0
soundfile: 0.12.1
pooch: v1.8.0
soxr: 0.3.7
typing_extensions: installed, no version number available
lazy_loader: installed, no version number available
msgpack: 1.0.7

numpydoc: None
sphinx: None
sphinx_rtd_theme: None
matplotlib: 3.7.1
sphinx_multiversion: None
sphinx_gallery: None
mir_eval: None
ipython: None
sphinxcontrib.rsvgconverter: None
pytest: None
pytest_mpl: None
pytest_cov: None
samplerate: None
resampy: None
presets: None
packaging: 23.1
bmcfee commented 8 months ago

The problem here is not to do with the amplitude of your signal, but rather with the default parameters of the onset envelope calculation not being a good fit for your particular signal.

The beat tracker expects musical signals by default, and the onset extraction algorithm is tuned accordingly. In this particular case, it's failing because the signal has no high-frequency content (signal appears to be shelved at 2K), and the onset extractor works by computing a median across (mel) frequency bands of spectral flux. Since most of the frequencies in question are above your cutoff, the resulting frequency aggregate is dominated by silence and you get no onset envelope.

There are two ways you could go about working around this:

  1. Use a mel spectrogram with a cutoff at 2K to match your signal, eg something like:
    M = librosa.power_to_db(librosa.feature.melspectrogram(y=audio, sr=sr, fmax=2000))
    oenv = librosa.onset.onset_strength(M=M, sr=sr)
    tempo, beats = librosa.beat.beat_track(onset_envelope=oenv, sr=sr)

    or

  2. Change the frequency aggregation so that it won't be thrown by large bands of silence:
    M = librosa.power_to_db(librosa.feature.melspectrogram(y=audio, sr=sr))
    oenv = librosa.onset.onset_strength(M=M, sr=sr, aggregate=np.mean)
    tempo, beats = librosa.beat.beat_track(onset_envelope=oenv, sr=sr)

Or you could mix and match the two strategies. Either should work in your case, and both shouldn't hurt.

meltyMap commented 8 months ago

Thank you for the help.

I have to admit I did not fully understand those concepts. I tried your code but oenv returned [0. 0.]

If I understood correctly, oenv = librosa.onset.onset_strength(M=M, sr=sr) should be (y=M,sr=sr) Or are there other parameters that correspond to M?

However, I tried using pitch_shift to directly increase the pitch of the original audio, and it worked! But I'm not sure how much to increase is appropriate, is increasing to above 2000hz enough?

Specifically what I did:

        M = librosa.power_to_db(librosa.feature.melspectrogram(y=audio, sr=sr, fmax=1000))
        oenv = librosa.onset.onset_strength(y=M, sr=sr)
        tempo1, beats1 = librosa.beat.beat_track(y=M,onset_envelope=oenv, sr=sr)
        print(oenv)
        print(tempo1,beats1)
        #return all 0

        M = librosa.power_to_db(librosa.feature.melspectrogram(y=audio, sr=sr))
        oenv = librosa.onset.onset_strength(y=M, sr=sr, aggregate=np.mean)
        tempo2, beats2 = librosa.beat.beat_track(onset_envelope=oenv, sr=sr)
        print('-----------------------')
        print(oenv)
        print(tempo2,beats2)
        #return all 0

        print(tempo2,beats2)
        audio_shifted = librosa.effects.pitch_shift(y=audio, sr=sr, n_steps=10)  
        tempo3, beats3 = librosa.beat.beat_track(y=audio_shifted, sr=sr, start_bpm=70)
        print(tempo3,beats3)
        #it worked!

By the way, this audio is actually the sound of blood flow through vessels, so indeed there is very little high frequency sound. I was hoping beat_track would get the heartbeat rate of the patient.

bmcfee commented 8 months ago

If I understood correctly, oenv = librosa.onset.onset_strength(M=M, sr=sr) should be (y=M,sr=sr)

sorry, this should be S=M (you're now providing a spectrogram input, not a time-domain signal).

However, I tried using pitch_shift to directly increase the pitch of the original audio, and it worked! But I'm not sure how much to increase is appropriate, is increasing to above 2000hz enough?

I think you'd be much better off limiting the frequency range of the spectral analysis as I described, rather than pitch-shifting your input signal. The latter might work in this case, but it's far more complicated than it needs to be, and likely introduces some artifacts (particularly around transients, which is what you're ultimately depending on for beat tracking) that could be avoided.

meltyMap commented 8 months ago

Yeah! it works! Now I understand almost everything.Thank you so much :D