Raw Waveform display/processing (notes + code of failed attempt)

teadrinker commented 6 months ago

I thought it would be nice to have access to the waveform, both for processing in order to generate sync events, and for display/feed waveform data into 3D points and other fun stuff... However I failed. (this was around Sept/Oct 2023)

I suspect Bass library might not be able to do this properly. You can pull waveform data, however you can not align the data in relation to the previous data you pulled.

But it might also just be I totally messed up somewhere...

I will dump the code here in case it might be useful:

Core/Audio/AudioAnalysis.cs

    public static readonly int WaveBufferLength = 44100 * 2 * 4; // 4 seconds (if format is 44100 stereo)
    /// <summary>
    /// circular buffer containing stereo waveform
    /// </summary>
    public static readonly float[] WaveBuffer = new float[WaveBufferLength]; 
    public static int WaveBufferPos = 0; // circular beffer end

    public static int WaveSourceReadFloatCount = 8192 * 2;     // if framerate drops below ~5 fps (for 44100 stereo), WaveBuffer will not be correct
    public static long WaveSourcePos = -1;
    //public static double WaveSourcePosSeconds = -1.0; 
    public static readonly float[] WaveTmpReadBuffer = new float[WaveSourceReadFloatCount];
    public static System.Collections.Generic.List<string> debugSizes = new();

in Core/Audio/AudioEngine.cs, I created UpdateWaveBuffer, which is called right after UpdateFftBuffer and uses the same args:

private static void UpdateWaveBuffer(int soundStreamHandle, Playback playback)
{
    if (playback.Settings != null && playback.Settings.AudioSource == PlaybackSettings.AudioSources.ProjectSoundTrack)
    {
        Bass.ChannelGetInfo(soundStreamHandle, out ChannelInfo info);
        var channels = info.Channels;
        var monoIsZeroMultiChannelIsOne = channels > 1 ? 1 : 0;

        long posInBytes = Bass.ChannelGetPosition(soundStreamHandle, PositionFlags.Bytes);
        long newPosSamples = (long) (Bass.ChannelBytes2Seconds(soundStreamHandle, posInBytes) * (double)info.Frequency);
        int diffInSamples = (int)(newPosSamples - AudioAnalysis.WaveSourcePos);
        bool hadValidSourcePos = AudioAnalysis.WaveSourcePos != -1;
        bool validDiff = hadValidSourcePos && diffInSamples > 0 && diffInSamples < AudioAnalysis.WaveSourceReadFloatCount / channels;
        AudioAnalysis.WaveSourcePos = newPosSamples;

        if (!validDiff)
            diffInSamples = 0;

        // Update circular buffer position
        AudioAnalysis.WaveBufferPos += diffInSamples * 2; // * 2 for stereo
        AudioAnalysis.WaveBufferPos %= AudioAnalysis.WaveBufferLength; // keep within bound

        // Bass.ChannelGetData gets data for forward in time,
        // since we don't know how long the next frame will be,
        // we always need to get enough bytes for worst case scenario to avoid gaps
        var byteCountToRead = AudioAnalysis.WaveSourceReadFloatCount * sizeof(float);
        var actualBytesRead = Bass.ChannelGetData(soundStreamHandle, AudioAnalysis.WaveTmpReadBuffer, (int)(DataFlags.Float) | byteCountToRead);

        var samplesRead = actualBytesRead / (sizeof(float) * channels);

#if DEBUG_WAVEFORM
        AudioAnalysis.debugSizes.Add("\n" + (byteCountToRead - actualBytesRead) + " " + byteCountToRead + " " + actualBytesRead+ " samplesRead:" + samplesRead + " diffInSamples:" + diffInSamples + " channels:" + channels + " freq:" + info.Frequency);
#endif                
        var wpos = AudioAnalysis.WaveBufferPos;

        // todo: avoid overwriting stuff we saved from last frame
        //int start_i = validDiff && byteCountToRead == actualBytesRead ? samplesRead - diffInSamples : 0;
        int start_i = 0;
        for (int i = start_i; i < samplesRead; i++)
        {
            AudioAnalysis.WaveBuffer[wpos    ] = AudioAnalysis.WaveTmpReadBuffer[i * channels];
            AudioAnalysis.WaveBuffer[wpos + 1] = AudioAnalysis.WaveTmpReadBuffer[i * channels + monoIsZeroMultiChannelIsOne];
            wpos += 2;
            if (wpos >= AudioAnalysis.WaveBufferLength)
            {
                wpos = 0;
#if DEBUG_WAVEFORM
                var tmp = new List<string>();
                for(var j = 0; j < 3*44100; j++)
                    tmp.Add("" + AudioAnalysis.WaveBuffer[j*2]);
                File.WriteAllText("C:\\_UnityProj\\tmpData.txt", string.Join(",", tmp ));
                File.WriteAllText("C:\\_UnityProj\\tmpInfo.txt", string.Join(",", AudioAnalysis.debugSizes));
#endif
            }
        }
    }

teadrinker commented 6 months ago

A simple solution would just be to pull all samples from the Project Sound Track, and keep them globally. Downsides:

Large memory use for lengthy tracks
Only works for Project Sound Track, not streams

pixtur commented 5 months ago

Interesting. I was using a similar approach by serializing the result of fft as json. I'm honestly not sure if processing the waveform directly on the flight would be fast enough in c#. But I'm not so much into audio. Maybe @HolgerFoerterer has an idea how to do this.

HolgerFoerterer commented 5 months ago

To get sample-precise output for video rendering, I spent a lot of time trying to convince Bass to switch from a real-time-based-approach to another mode where I could get access to buffered data in a consistent way. To be honest I failed too. Whatever I did... whenever I repositioned the playback in any way... things screwed up. So at the moment, I position the playback at the very beginning of the recording and avoid repositioning during the render.

So yes, you should theoretically be able to obtain buffered data by using a comparable approach. At least for FFT data there is a flag to fill the FFT without consuming new data.

But in a real-time scenario, don't expect buffered data to be exactly the same every time. Bass will apparently playback faster/slower and even skip as it sees fit to keep sync. And I don't know how to align that data then. When we get new data, it's obviously current, but that seems to be all we know.

And to answer the question by @pixtur: C# should be able to handle manual processing of stereo samples at 44.1-48 kHz easily.

teadrinker commented 5 months ago

whenever I repositioned the playback in any way... things screwed up

I suspect it might not be possible with the current API due to the nature of sound running in another thread. You'd need a function that give you the data AND the position in the same API call.... otherwise there is no guarantee they would be in sync.

tooll3 / t3

Raw Waveform display/processing (notes + code of failed attempt) #398