liamappelbe / fftea

A simple and efficient FFT implementation for Dart
Apache License 2.0
63 stars 1 forks source link

Found Key & BPM from Audio byte #36

Closed Mamena2020 closed 1 month ago

Mamena2020 commented 1 year ago

Hi. I'am trying to get information from audio like Key and BPM from Audio byte. Can I use this package to detect audio key and Bpm?

My audio byte in pcm16buffer with sample rate 16000 but stored in Uint8List, so i convert to Int16List.

         List<double> pcmDoubleData = audioBytes.buffer
          .asInt16List()
          .map((value) => value / 32768.0)
          .toList();

But i don't know next part to using this fft.

liamappelbe commented 1 year ago

These are pretty big questions with multiple answers, so I can't give a simple answer. I'll say some high level stuff, and then you can make a decision about what approach you want to take, and if you need help with the details we can drill down.

By key I assume you mean this definition? That seems pretty difficult. In fact it's not even a well defined problem. A melody in C major could just as easily be in A minor (specifically natural minor), depending on how the composer writes it. So sometimes the key can be subjective.

If you ignore that problem and try to just detect which notes are in the melody (ie deduce the key signature) then the problem goes from undefined to just difficult. You could FFT or STFT the music and then look at the loudest frequencies. If you try that you'll run into 2 more problems:

  1. The frequency peaks won't be on a single frequency, they'll be spread out over several frequencies. So you'll need to write a clever algorithm to extract the peaks. This is tricky, but doable. STFT might make this easier because you can probably just extract just the single loudest frequency from each chunk, but then you'll have to worry about the time accuracy vs frequency accuracy trade off. Another approach would be to FFT the entire song, then collect all the frequencies into buckets based on which note of the chromatic scale is closest, and find the loudest buckets (this seems like the most robust approach to me).
  2. Musical notes played by real instruments are not pure tones. Every note will have harmonics that will also appeach in the FFT. Not sure how much of a problem this is, since harmonics are usually a lot quieter than the main note, and probably fall mostly in the same key anyway. So you'll just have to experiment with this to see if it's actually a problem.

BPM detection is much easier, but I probably wouldn't use FFT. The algorithm would look something like this:

  1. Use a low pass filter to remove all the high frequencies. You can do this with FFT (FFT, zero out the high freqs, IFFT), but that's overkill/inefficient.
  2. Iterate over the samples and keep track of the loudness (square of the amplitude). Smooth out this loudness value using a very low frequency low pass filter (the frequencies it allows should be on the order of a few Hz). The peaks of this signal are the beats.
  3. Debounce those beats. I'd expect a 60 bpm song to mostly get 1 Hz beats, but occasionally also 2 Hz, 4 Hz etc, and also sometimes 2 beats in quick succession due to noise in the beat signal. Actually, now that I think about it, maybe the easiest thing would be to FFT the beat signal from step 2 and pick the loudest frequency. I'm not sure if that would work, but it's worth a try.
Mamena2020 commented 1 year ago

Thanks for your answers.

What do you think of this code.

    void detectPitch(Uint8List audioBytes) {

        // Convert Int16List to double list for FFT
        final pcmDoubleData = audioBytes.buffer
        .asInt16List()
        .map((value) => value / 32768.0)
        .toList();

        final fftx = fftea.FFT(pcmDoubleData.length);
        Float64List freqlist = fftx.realFft(pcmDoubleData).discardConjugates().magnitudes();

       // Sample rate 
       double sampleRate = 16000;

       // Calculate the corresponding frequency (pitch)
       double binWidth = sampleRate / fftx.size;

       // Collect frequencies into "buckets"
       Map<String, double> frequencyBuckets = {};

        for (int i = 0; i < freqlist.length; i++) {
              double magnitude = freqlist[i];
               // Convert frequency to closest note on the chromatic scale
              int closestNote = (i * binWidth / (binWidth * 2)).round();
              double noteFrequency = closestNote * binWidth * 2;

              // Add the magnitude to the bucket for the closest note
              frequencyBuckets[noteFrequency.toString()] = magnitude;
        }

        // Find the loudest frequency buckets
        double maxMagnitude = 0;
         String loudestNote = '';

          frequencyBuckets.forEach((note, magnitude) {
                  if (magnitude > maxMagnitude) {
                    maxMagnitude = magnitude;
                    loudestNote = note;
                  }
          });
         print('Loudest Note: $loudestNote');
  }

For some cases, I get valid results by checking the list of keys and frequencies on https://pages.mtu.edu/~suits/notefreqs.html , but for other cases I get invalid results, especially for minor keys, if i tested the same audio on https://tunebat.com/Analyzer

liamappelbe commented 1 year ago

First thing I would check is make absolutely sure you're decoding the audioBytes correctly. If you're building a long chain of processing, it's always good to verify that each step is working before moving on to the next. I had a problem when I was first writing the FFT spectrogram example app, where the results were really noisy. After a while I figured out that the FFT was fine, but my wav file reading code was broken. Eventually I cleaned up the wav reader and moved it into its own package. So the first thing I'd recommend is saving your pcmDoubleData to a wav file using package:wav, and listen to it and make sure it sounds correct:

await Wav([Float64List.fromList(pcmDoubleData)], sampleRate).writeFile('test.wav');

Next, I think the frequency calculations and bucketing logic in the second section of your code are wrong. For starters, the buckets are going to be different sizes rather than having a fixed binWidth, because musical scales are geometric (eg, the 4th octave is twice as big as the 3rd, when measure in Hz). So somewhere in your math you'll need a pow or a log2 to convert between linear frequencies and geometric pitches.

Also, in your map, you should be adding the frequencies to the bucket (+=, not =), since multiple frequencies will land in each bucket, and you're interested in the overall loudness. Also, as a general note, you don't need to key your map by string. Dart lets your key your map by any type, so converting to string is inefficient.

Anyway, you can rewrite it to be more straightforward using FFT.frequency:

const int kNotesPerOctave = 12;
const double kA4Freq = 440;
const int kC2Index = 0;
const int kA4Index = 2 * kNotesPerOctave + 9;
const int kC8Index = 6 * kNotesPerOctave;

// Maps frequencies (Hz) to note indices (C2 is 0, C#2 is 1, C3 is 12 etc).
int frequencyToNoteIndex(double freqHz) {
  return (kNotesPerOctave * log2(freqHz / kA4Freq) + kA4Index).round();
}

// Inside detectPitch:
// Note that we're only storing notes from C2 to C8. Frequencies outside that range are
// probably not musical, but harmonics (above C8) or rhythms (below C2).
final freqBuckets = List<double>.filled(kC8Index, 0);

for (int i = 0; i < freqlist.length; i++) {
  final mag = freqlist[i];
  final freq = fftx.frequency(i, sampleRate);
  final idx = frequencyToNoteIndex(freq);
  if (idx >= 0 && idx < freqBuckets.length) {
    // Square magnitude is a bit better for perceived loundness imo, but you can experiment.
    freqBuckets[idx] += mag * mag;
  }
}

I haven't tested that code, so you might need to debug it, but you get the idea.

Now, for the last section, the code you've written will find the single loudest note, not the loudest notes. Are you hoping that the loudest note will just be the key? Like the loudest note in a C major song will be a C? I don't think that will be true in general. Also, using that approach, you wouldn't be able to tell the difference between C major and C minor.

I think to figure out what you want to do here, you'll need to be more specific about what you're trying to accomplish. What exactly do you mean by the key? What are you wanting to use this information for?

For example, I'm one of the developers on this website: https://onlinesequencer.net/ . We have logic to try to detect what key the user is writing their song in. We use that info to show them a key guide, which highlights the notes in that key, so it's easier to write the melody. So for that use case, we don't care about the key per se, just the key signature.

If I was going to implement key signature detection using freqBuckets, I'd wrap all the octaves around and sum all the notes. The next step is a bit tricky and depends what you're trying to accomplish. For Online Sequencer, we're trying to match the melody to a hard coded list of key signatures. So for that I'd probably generate a score for each key signature based on the note loudness, and return the key sig with the highest score:

enum KeySignature {
  CMajor,
  ...
}

const kKeySignatures = <KeySignature, List<bool>> {
  KeySignature.CMajor: [true, false, true, false, true, true, false, true, false, true, false, true],
  ...
};

// Inside detectPitch:
double bestScore = 0;
KeySignature? bestKey;
frequencyBuckets.forEach((KeySignature key, List<double> notes) {
  double score = 0;
  for (int i = 0; i < freqBuckets.length; ++i) {
    final loudness = freqBuckets[i];
    if (notes[i % kNotesPerOctave]) {
      // If the note is in the key, increase the score.
      score += loudness;
    } else {
      // Otherwise decrease the score.
      score -= loudness;
    }
  }
  if (bestKey == null || score > bestScore) {
    bestScore = score;
    bestKey = key;
  }
});

return bestKey!;