liamappelbe / fftea

A simple and efficient FFT implementation for Dart
Apache License 2.0
63 stars 1 forks source link

Gaussian Window #42

Closed codakkk closed 2 weeks ago

codakkk commented 9 months ago

Hello, I'm looking for Gaussian Window implementation. Is there any future plan?

codakkk commented 9 months ago

I wrote this one, maybe can be useful, I'm still not sure it's correct.

Float64List _gaussian(
  int bufferSize, {
  double alpha = 0.25,
}) {
  final res = Float64List(bufferSize);
  for (int i = 0; i < bufferSize; i++) {
    res[i] = math
        .pow(
          math.e,
          -0.5 *
              math.pow(
                (i - (bufferSize - 1) / 2) / ((alpha * (bufferSize - 1)) / 2),
                2,
              ),
        )
        .toDouble();
  }
  return res;
}
codakkk commented 9 months ago

I'm trying to achieve a Spectro temporal modulation Spectrogram of the sound, like the one found in Praat. Do you have some advices?

Screenshot 2023-10-10 at 12 41 21
liamappelbe commented 9 months ago

Hello, I'm looking for Gaussian Window implementation. Is there any future plan?

I hadn't heard of the gaussian window. Looks like it's a function in matlab/octave though, so you could use that to verify your implementation (ie, check that your function produces the same window values as matlab). In fact this is how a lot of the unit tests are written: I just have a python script that generates sample inputs and outputs using a reference implementation, and then the test checks that my Dart version produces the same output.

A default alpha of 0.25 seems really low tho. My copy of octave seems to use a default of 2.5.

I'm trying to achieve a Spectro temporal modulation Spectrogram of the sound, like the one found in Praat. Do you have some advices?

I don't know exactly what you mean by "spectro temporal modulation spectrogram", but the easiest way to generate a spectrogram is to use the STFT class. If you need something more specific, you can copy and modify that class (here's the file, and here's the version before the streaming API made things complicated). Maybe you could explain how a spectro temporal modulation spectrogram is different to an ordinary spectrogram?

codakkk commented 9 months ago

First of all, thank you for you answer! Very appreciated

I hadn't heard of the gaussian window. Looks like it's a function in matlab/octave though, so you could use that to verify your implementation (ie, check that your function produces the same window values as matlab). In fact this is how a lot of the unit tests are written: I just have a python script that generates sample inputs and outputs using a reference implementation, and then the test checks that my Dart version produces the same output.

I will write some tests, thank you for the advice :)

A default alpha of 0.25 seems really low tho. My copy of octave seems to use a default of 2.5.

I found in some documentation (including Wikipedia and primarly on Google Scholar) seems that alpha must be less or equal to 0.5.

I don't know exactly what you mean by "spectro temporal modulation spectrogram", but the easiest way to generate a spectrogram is to use the STFT class

I'm already using STFT, and I'm confident that the process is correct. However, I may be missing something in the rendering process.

Maybe you could explain how a spectro temporal modulation spectrogram is different to an ordinary spectrogram?

Well, as I mentioned in the previous sentence, I didn't found many resources about the subject. All I found is this Praat documentation and this from a MatLab library.

Thank you for your help, anyway. In case the test are successful, is the Gaussian Window an appreciated PR?

liamappelbe commented 9 months ago

In case the test are successful, is the Gaussian Window an appreciated PR?

Yeah, definitely.

codakkk commented 9 months ago

@liamappelbe Hi, as I mentioned in the previous messages I need to generate a Spectrogram. Took the example and made some modifications (ot many, just added data to a list instead of printing it), this is my code:

Future<List<double>> calculateSpectrogram(
  STFT stft,
  int chunkSize,
  List<double> samples,
) async {
  const buckets = 120;
  Uint64List? logItr;

  List<double> logBinnedData = [];

  stft.run(
    samples,
    (Float64x2List chunk) {
      final amp = chunk.discardConjugates().magnitudes();
      logItr ??= linSpace(amp.length, buckets);

      // Calculate Logarithm binning (am I right?)
      int i0 = 0;
      for (final i1 in logItr!) {
        double power = 0;
        if (i1 != i0) {
          for (int i = i0; i < i1; ++i) {
            power += amp[i];
          }
          power /= i1 - i0;

          logBinnedData.add(math.log(power));
        }
        i0 = i1;
      }
    },
    chunkSize ~/ 2,
  );

  return logBinnedData;
}

Is that correct?

My second question is related to painting the spectrogram using CustomPaint. Here is my current code:

@override
  void paint(Canvas canvas, Size size) {
    final width = size.width;
    final height = size.height;

    final cellWidth = width / numTimeBins;
    final cellHeight = height / numFrequencyBins;

    for (int t = 0; t < numTimeBins; t++) {
      for (int f = 0; f < numFrequencyBins.toInt(); f++) {
        final index = t * numFrequencyBins.toInt() + f;
        final logPower = data[index];
        final color = mapIntensityToColor(logPower);

        final rect = Rect.fromPoints(
          Offset(t * cellWidth, height - f * cellHeight),
          Offset((t + 1) * cellWidth, height - (f + 1) * cellHeight),
        );

        final paint = Paint()
          ..color = color
          ..style = PaintingStyle.fill;

        canvas.drawRect(rect, paint);
      }
    }
  }

but this doesn't seems to work properly. This is a screen from the current result: image

Can you give me some advices? Sorry if something is heavily wrong, but I'm new in this field. What I did expect is to have a matrix composed like this data[time][frequency]

liamappelbe commented 9 months ago

Yeah, that spectrogram doesn't look right.

Your calculateSpectrogram function looks correct to me, but it would probably be easier to deal with the result if you made it a List<List<double>>, rather than a List<double>. Then you wouldn't have to do that error prone index math in the rendering code.

I'm not familiar with this CustomPaint library, but I don't see anything obviously wrong with your code. My first guess would be that the bug is in your mapIntensityToColor function. The power values range from 0 to 1, but you're plotting the powers, which range from -infinity to 0. It can be tricky to map that range to a color. Maybe try the ordinary non-log power first?

Otherwise, my general debugging advice would be to test each stage of the pipeline individually:

Those are 3 components that all need to be working for your spectrogram to draw correctly, so test each of them in isolation to find where the bug is.

codakkk commented 9 months ago

Yeah, that spectrogram doesn't look right.

Your calculateSpectrogram function looks correct to me, but it would probably be easier to deal with the result if you made it a List<List<double>>, rather than a List<double>. Then you wouldn't have to do that error prone index math in the rendering code.

Was thinking about making it a List<List<double>> with first index pointing to time and second index pointing to frequency, but tbh I'm not sure how to achieve that.

I'm not familiar with this CustomPaint library, but I don't see anything obviously wrong with your code. My first guess would be that the bug is in your mapIntensityToColor function. The power values range from 0 to 1, but you're plotting the powers, which range from -infinity to 0. It can be tricky to map that range to a color. Maybe try the ordinary non-log power first?

You mean that I should remove the math.log in logBinnedData, right? If so, I'll give it a try, thank you! Just to give you context, this is the mapIntensityToColor:


Color mapIntensityToColor(double intensity) {
// Map intensity to grayscale color (black to white)
final grayValue = !intensity.isFinite ? 0 : (intensity * 255).toInt();
return Color.fromARGB(255, grayValue, grayValue, grayValue);

}



> Otherwise, my general debugging advice would be to test each stage of the pipeline individually:
> 
> * Print the matrix that your `calculateSpectrogram` function generates, and try visualizing it in Matlab. Does it look right?
> * Test your painting code by replacing `color = mapIntensityToColor(logPower)` with an expression to, say, draw a checkerboard pattern.
> * Print a bunch of input output pairs for the `mapIntensityToColor` function and make sure they look right.
> 
> Those are 3 components that all need to be working for your spectrogram to draw correctly, so test each of them in isolation to find where the bug is.
>
I'm totally new to MatLab, what should be the right way to visualize the data? All the functions I found are about plotting a spectrogram from a Vector and not a matrix (spectrogram function computes STFT by itself).
Thank you!
liamappelbe commented 9 months ago

Was thinking about making it a List<List<double>> with first index pointing to time and second index pointing to frequency, but tbh I'm not sure how to achieve that.

// Btw, I noticed you're not using await in this function, so it doesn't need to be async.
List<List<double>> calculateSpectrogram(
  STFT stft,
  int chunkSize,
  List<double> samples,
) {
  const buckets = 120;
  Uint64List? logItr;

  // Declare the List<List<double>>
  List<List<double>> logBinnedData = [];

  stft.run(
    samples,
    (Float64x2List chunk) {
      // Each time you get a new chunk, add a row to your matrix.
      List<double> chunkPowers = [];
      logBinnedData.add(chunkPowers);

      final amp = chunk.discardConjugates().magnitudes();
      logItr ??= linSpace(amp.length, buckets);

      int i0 = 0;
      for (final i1 in logItr!) {
        double power = 0;
        if (i1 != i0) {
          for (int i = i0; i < i1; ++i) {
            power += amp[i];
          }
          power /= i1 - i0;

          // Add data to the row. In Dart, Lists (like most things) are pass by reference.
          // So chunkPowers refers to the same List that is sitting in the last row of
          // logBinnedData. Adding to the list adds to the last row of logBinnedData.
          chunkPowers.add(math.log(power));
        }
        i0 = i1;
      }
    },
    chunkSize ~/ 2,
  );

  return logBinnedData;
}

Just to give you context, this is the mapIntensityToColor

Yeah, that function looks like it's built to handle intensity values between 0 and 1. You're passing in values between -infinity and 0, so you're getting a whole lot of large negative values. From the documentation of Color.fromARGB, values outside 0 to 255 are wrapped around (modulo'd) into that range. So you're essentially visualizing the fractional part of a large negative number.

I'm totally new to MatLab, what should be the right way to visualize the data?

I just meant play with the values and see if they make sense. There are all sorts of ways of visualizing a matrix. Probably the simplest in this case is imagesc (I'm not sure how well that'll work for values in the range -infinity to 0 though).

If you're not very experienced with matlab though it's probably not worth the effort. I'm 90% sure that removing the log, or updating your mapIntensityToColor function to handle loggy data, will fix your issue.

codakkk commented 9 months ago

Was thinking about making it a List<List<double>> with first index pointing to time and second index pointing to frequency, but tbh I'm not sure how to achieve that.

// Btw, I noticed you're not using await in this function, so it doesn't need to be async.
List<List<double>> calculateSpectrogram(
  STFT stft,
  int chunkSize,
  List<double> samples,
) {
  const buckets = 120;
  Uint64List? logItr;

  // Declare the List<List<double>>
  List<List<double>> logBinnedData = [];

  stft.run(
    samples,
    (Float64x2List chunk) {
      // Each time you get a new chunk, add a row to your matrix.
      List<double> chunkPowers = [];
      logBinnedData.add(chunkPowers);

      final amp = chunk.discardConjugates().magnitudes();
      logItr ??= linSpace(amp.length, buckets);

      int i0 = 0;
      for (final i1 in logItr!) {
        double power = 0;
        if (i1 != i0) {
          for (int i = i0; i < i1; ++i) {
            power += amp[i];
          }
          power /= i1 - i0;

          // Add data to the row. In Dart, Lists (like most things) are pass by reference.
          // So chunkPowers refers to the same List that is sitting in the last row of
          // logBinnedData. Adding to the list adds to the last row of logBinnedData.
          chunkPowers.add(math.log(power));
        }
        i0 = i1;
      }
    },
    chunkSize ~/ 2,
  );

  return logBinnedData;
}

Just to give you context, this is the mapIntensityToColor

Yeah, that function looks like it's built to handle intensity values between 0 and 1. You're passing in values between -infinity and 0, so you're getting a whole lot of large negative values. From the documentation of Color.fromARGB, values outside 0 to 255 are wrapped around (modulo'd) into that range. So you're essentially visualizing the fractional part of a large negative number.

I'm totally new to MatLab, what should be the right way to visualize the data?

I just meant play with the values and see if they make sense. There are all sorts of ways of visualizing a matrix. Probably the simplest in this case is imagesc (I'm not sure how well that'll work for values in the range -infinity to 0 though).

If you're not very experienced with matlab though it's probably not worth the effort. I'm 90% sure that removing the log, or updating your mapIntensityToColor function to handle loggy data, will fix your issue.

Thank you for all your advices. I've been able to make it! (Or seems, so lol) This is my result: spectrogram_tests_DZrivmTSyv

and this is the same sound plotted with matplotlib in Python:

WhatsApp Image 2023-10-20 at 18 34 48_a6241962

And to me, they both look similiar! Do you have some advices about making the colours smoother? mapIntensityToColour is the same as before, I just removed math.log

codakkk commented 9 months ago

Ok, seems I have been able to reproduce the same spectrogram using math.log. For anyone reading in the future, I've used this code:

Color getColour(double value) {
    int len = colours.length;

    assert(len > 1);
    assert(max >= min);

    if (value >= max) {
      return colours.last;
    }

    if (value <= min) {
      return colours.first;
    }

    // Get the scaled values and indexes to lookup the colour
    final m = ((len - 1).toDouble()) / (max - min); // TODO: Precalc this value
    final scaledValue = (value - min) * m;
    final idxValue = scaledValue.floor();
    final ratio = scaledValue - idxValue.toDouble();
    final (i, j) = (idxValue, idxValue + 1);

    // Prevent over indexing after index computation
    if (j >= len) {
      return colours.last;
    }

    // Get the colour band
    final first = colours[i];
    final second = colours[j];

    return Color.fromARGB(
      interpolate(first.alpha, second.alpha, ratio),
      interpolate(first.red, second.red, ratio),
      interpolate(first.green, second.green, ratio),
      interpolate(first.blue, second.blue, ratio),
    );
  }

  int interpolate(int start, int finish, double ratio) {
    return ((finish.toDouble() - start.toDouble()) * ratio + start.toDouble())
        .round();
  }

where min/max is the minimum and maximum frequency :)

codakkk commented 9 months ago

Hi @liamappelbe, I know I'm disturbing you a lot, but you're the only person I can ask about! Feel free to not answer me :) As I said some days ago, I'm trying to achieve a Spectrogram like the one in Praat.

Screenshot 2023-10-23 at 12 23 36

The one you see on top is mine. The other is Praat's spectrogram. As you can see in the image, mine looks blurry and pixelated, the other one is more "sophisticated". Do you have some advices? Just to give you more context, here's my Spectrogram implementation Thank you!

liamappelbe commented 9 months ago

Hi @liamappelbe, I know I'm disturbing you a lot, but you're the only person I can ask about!

You're fine. Half the bugs in this repo are people asking general signal processing questions that don't really have anything to do with fftea 😛

As you can see in the image, mine looks blurry and pixelated, the other one is more "sophisticated".

STFT has a trade off between time resolution and frequency resolution. Fun fact: this is directly related to the uncertainty principle in quantum mechanics. If you shrink your chunk size to increase your time resolution, that will also decrease your frequency resolution.

I don't think their spectrogram is better than yours, they've just made a different time/frequency trade off. Their columns are thinner, but vertically the frequencies are a lot blurrier than yours. I also notice some banding in their spectrogram that looks like a bug to me. See how it alternates between light and dark? I can't think of a legitimate reason a spectrogram would look like that.

image

As I said, their columns are thinner, so the first thing you could do is decrease your chunk size a bit. I notice in your implementation, around line 126, you're doubling your chunk size until it's big enough to capture all the frequencies you want. This will overestimate the chunk size needed (which also explains why you've got a blank patch at the top of your spectrogram). The chunk size doesn't need to be a power of 2. Fftea can do FFTs of any size. Power of 2 sized FFTs are faster, but unless you're actually running into performance issues that doesn't matter.

The other factor I haven't mentioned yet is chunk stride. Chunk stride is what actually determines the column width, and in your code you have it set to half the chunk size, but the stride can be smaller if you want. In the image you sent, it looks like their vertical resolution is really high, but the frequencies are blurry. The height of one of those vertically blurred dark patches is quite a lot bigger than the pixel size in your spectrogram, so this probably isn't a good trade off for them. I think this is a place where yours is better. But if you really want yours to look more like theirs, my guess is they're using a larger chunk size but a smaller chunk stride. The striding and windowing you need to get the behavior you want is more of an art than a science, so this is really just a matter of trying different things and seeing what works best for your use case. For the gaussian window, remember to also try different alpha values.

But don't worry about trying to get your picture to look like theirs. Just keep in mind what your use case is. Figure out what you need this spectrogram for, and tailor your algorithm for that purpose.

codakkk commented 9 months ago

Hi @liamappelbe, I know I'm disturbing you a lot, but you're the only person I can ask about!

You're fine. Half the bugs in this repo are people asking general signal processing questions that don't really have anything to do with fftea 😛

Yes, I noticed that's what made me ask questions! You're very helpful

As you can see in the image, mine looks blurry and pixelated, the other one is more "sophisticated".

STFT has a trade off between time resolution and frequency resolution. Fun fact: this is directly related to the uncertainty principle in quantum mechanics. If you shrink your chunk size to increase your time resolution, that will also decrease your frequency resolution.

I don't think their spectrogram is better than yours, they've just made a different time/frequency trade off. Their columns are thinner, but vertically the frequencies are a lot blurrier than yours. I also notice some banding in their spectrogram that looks like a bug to me. See how it alternates between light and dark? I can't think of a legitimate reason a spectrogram would look like that.

image

As I said, their columns are thinner, so the first thing you could do is decrease your chunk size a bit. I notice in your implementation, around line 126, you're doubling your chunk size until it's big enough to capture all the frequencies you want. This will overestimate the chunk size needed (which also explains why you've got a blank patch at the top of your spectrogram). The chunk size doesn't need to be a power of 2. Fftea can do FFTs of any size. Power of 2 sized FFTs are faster, but unless you're actually running into performance issues that doesn't matter.

Yes, you're completely right. The code you looked at is similiar to Praat's code. The idea is to capture frequencies from 0 to N Hz where N is usually 5000Hz (Here's a screenshot with Praat's spectrogram showing frequency "limits", 0 Hz - 5000 Hz).

Praat_FPf02DBx72

The striding and windowing you need to get the behavior you want is more of an art than a science, so this is really just a matter of trying different things and seeing what works best for your use case. For the gaussian window, remember to also try different alpha values.

I already spent some weeks on trying to achieve a similar Spectrogram, but I totally failed 😂 I'll keep grinding 🗡️

But don't worry about trying to get your picture to look like theirs. Just keep in mind what your use case is. Figure out what you need this spectrogram for, and tailor your algorithm for that purpose.

Yes, I understand your point and... well, my thesis is about re-create some parts of Praat on Mobile, and Spectrogram is one of them, that's why I'm trying to achieve a similar Spectrogram

Maybe one of the "problems" is that I'm using STFT and Praat's using FFT? I don't know, tbh Sometime I'm not even sure if it's an algorithm problem or rendering problem I'm a little bit discouraged right now 😅

codakkk commented 9 months ago

Ok, I'm happy right now. After some experiments with Praat, I've been able to reproduce my Spectrogram result with Praat. Praat_VEyVysvZfL This is the same sound but with 20000Hz on frequency. And It looks the same as mine! It seems that with lower frequency (eg: 5000 Hz), Praat just cuts the Spectrogram rendering! In fact, if I try to take the same portion manually from my Spectrogram, this is my result:

image and this is Praat's output: Praat_zGyPW2uXox And to me, they both look the same!

Do you have some advices on how to achieve this "cut" behaviour when rendering?

liamappelbe commented 9 months ago

Maybe one of the "problems" is that I'm using STFT and Praat's using FFT?

STFT is a thin wrapper around FFT. It just cuts the input into chunks, applies a window, and runs FFT on the chunk. So you're both using STFT and also FFT.

Do you have some advices on how to achieve this "cut" behaviour when rendering?

If you don't care about those frequencies, the best thing to do would be to use a smaller chunk size (not a power of 2), because that will give you more time resolution than just cutting off the top of the image.

But if you really want emulate praat closely, including the sub-optimal parts, then just change your rendering code to not draw those frequencies. In spectrogram_widget_painter.dart, change your final frequenciesBin = spectrogram.numberOfFreqs; variable to only draw the frequencies you actually want to see.

codakkk commented 9 months ago

Maybe one of the "problems" is that I'm using STFT and Praat's using FFT?

STFT is a thin wrapper around FFT. It just cuts the input into chunks, applies a window, and runs FFT on the chunk. So you're both using STFT and also FFT.

I knew that but you know, I just tried asking to be sure 😂

But if you really want emulate praat closely, including the sub-optimal parts, then just change your rendering code to not draw those frequencies. In spectrogram_widget_painter.dart, change your final frequenciesBin = spectrogram.numberOfFreqs; variable to only draw the frequencies you actually want to see.

Okay, that's how I implemented that and works fine. But for example how can I skip frequencies greater than 5000 Hz? Because my logBinnedData contains arrays of 116 items. Maybe am I missing something?

liamappelbe commented 9 months ago

But for example how can I skip frequencies greater than 5000 Hz? Because my logBinnedData contains arrays of 116 items. Maybe am I missing something?

You'll need to figure out which bin the 5000 Hz element ends up in. The indexOfFrequency function will tell you which element of the STFT chunk is closest to 5000 Hz, then you'll need to figure out which bucket it lands in after you put all the frequencies in logarithmic buckets.

codakkk commented 9 months ago

But for example how can I skip frequencies greater than 5000 Hz? Because my logBinnedData contains arrays of 116 items. Maybe am I missing something?

You'll need to figure out which bin the 5000 Hz element ends up in. The indexOfFrequency function will tell you which element of the STFT chunk is closest to 5000 Hz, then you'll need to figure out which bucket it lands in after you put all the frequencies in logarithmic buckets.

Ok thank you, never find out about indexOfFrequency :) I'll try soon. Do you have some advices about zooming in/out? (should only affect time axis)

liamappelbe commented 9 months ago

Do you have some advices about zooming in/out? (should only affect time axis)

Engineering first principals: break the problem down. There's 2 parts, when to zoom, and how to zoom. So first set up some sort of UI interaction that lets the user zoom in. Once you've done that, zooming in the time axis of the spectrogram just means changing visibleTimeStart and visibleTimeEnd in your painter class.

codakkk commented 9 months ago

Do you have some advices about zooming in/out? (should only affect time axis)

Engineering first principals: break the problem down. There's 2 parts, when to zoom, and how to zoom. So first set up some sort of UI interaction that lets the user zoom in. Once you've done that, zooming in the time axis of the spectrogram just means changing visibleTimeStart and visibleTimeEnd in your painter class.

Well, this whole part is already implemented, the problem is the spectrogram looks blurry and pixelated :( spectrogram_tests_ndFPgjDFS0

This is how I'm calculating visible bins:

    // ....
    final frequenciesBin = spectrogram.numberOfFreqs;
    // ....
    int zoomedTimeBin = (timeBin / zoom).floor();
    // Calculate the center point
    int centerTimeBin = timeBin ~/ 2;
    // ....
    // Calculate the starting and ending indices for the visible time bins
    int visibleTimeStart = centerTimeBin - zoomedTimeBin ~/ 2;
    int visibleTimeEnd = centerTimeBin + (zoomedTimeBin + 1) ~/ 2;
    // ....

This should be all correct and the problem must be something with rendering

liamappelbe commented 9 months ago

Well, this whole part is already implemented, the problem is the spectrogram looks blurry and pixelated :(

Wdym? Zooming in is just going to draw the chunks wider. It's not going to draw additional chunks in between the existing ones. Even in your zoomed out image the individual chunks are clearly visible. What else were you expecting?

codakkk commented 9 months ago

Well, this whole part is already implemented, the problem is the spectrogram looks blurry and pixelated :(

Wdym? Zooming in is just going to draw the chunks wider. It's not going to draw additional chunks in between the existing ones. Even in your zoomed out image the individual chunks are clearly visible. What else were you expecting?

I'm trying to achieve a zoom similar to this:

Screenshot 2023-10-26 at 14 59 45

Same spectrogram zoomed in Praat

codakkk commented 9 months ago

After some more tests, I have been able to reproduce the same spectrogram, using a lower chunk stride (around 16). Now I have to figure out how to calculate it based on zoom. I'll keep working on it :) WhatsApp Image 2023-10-26 at 21 12 27_cd55e6ac