Open kevenwyld opened 1 month ago
As far as I can tell this seems to be caused by harsh transitions between each note because the waveforms intersect abruptly rather than decaying and overlapping with the next note.
Hello, thanks for the report and kind words!
ffmpeg -i bass.wav -lavfi showspectrumpic=s=1920x1080:mode=separate spectrogram.png
This is quite cool, I didn't know ffmpeg could do that!
It's possible this is unique to my machine. I have only tested on one device.
Would love to hear it and compare it with my results if you can share it :)
As far as I can tell this seems to be caused by harsh transitions between each note because the waveforms intersect abruptly rather than decaying and overlapping with the next note.
Yup, that sounds correct. Something is going on in the wav.zig
module...
Would love to hear it and compare it with my results if you can share it :)
Here's a file I generated. There was some post processing on this one to resample, and get rid of the DC offset. The original is included in the zip too:
post processing steps:
ffmpeg -i bass.wav -ac 1 -ar 44100 bass_44100.wav
wavegain -y bass_44100.wav
The tick is in the original too but none of my equipment plays nice with the DC offset so I have to post process to play them.
My friend sent me an interesting video about this issue and creating a window function to deal with it. I was messing around with it but I don't quite have the understanding to implement anything yet. https://youtu.be/PjKlMXhxtTM?si=JQPNJWQybmlZVTY5&t=742
Yeah I see. That's an issue that I was aware of but not able to fix so far :/ It's probably related to the encoding of the samples...
My friend sent me an interesting video about this issue and creating a window function to deal with it. I was messing around with it but I don't quite have the understanding to implement anything yet.
That's a good reference - your friend has a big brain!
I tried implementing the Hann window function in #38 - it makes things a bit better. But I think it can be improved. Can you take a look? 🙂
Thanks! I built that PR and tried it out. It does get rid of the tick by fading down to zero between each note. Unfortunately this has two impacts on the audio that may not be desirable:
Here they are not normalized
These have both been normalized using tenacity's builtin normalizing filter with DC offset removal selected.
It may be that in order to implement a window function the offset will need to be removed first. Sorry that this ended up being so complicated.
I tried playing with some of the constants in the function but couldn't find anything that improves the situation.
Thanks for sharing your findings. I 100% agree that the window function that I applied changes the vibe a lot. It definitely needs some tweaking and playing around.
It may be that in order to implement a window function the offset will need to be removed first. Sorry that this ended up being so complicated.
I'm not sure if I understood what you mean about DC offset fully. I'm not sure how that should be possible 🤔
I'm not sure if I understood what you mean about DC offset fully. I'm not sure how that should be possible 🤔
DC offset may be something requiring another github issue as it's not really related to this ticking sound, I just wasn't sure until now that it impacts the solution to the ticking. I will try, with my limited understanding, to explain what I think is causing this, and the impacts of it:
DC offset or DC bias in audio means that the mean value of the waveform is either above or below zero. Since this term comes from analog circuits, and we are talking about sound over a speaker, we can think of the speaker at "rest" being zero. When a correctly DC balanced sine wave (mean 0) is reproduced the speaker moves outwards from zero "rest" position to the maximum positive peak, and then inwards to the (negative) minimum trough of the sine wave, creating the pressure changes we perceive as sound.
The audio produced by this program has an entirely positive DC bias, so the entire wave form, both the peak and the trough of the sine wave is above zero. So when played over a speaker the speaker is forced outwards at all times, effectively moving the zero position to half the wave height, and requiring the amplifier to hold the speaker in an outward position at all times, never passing the rest position.
The reason I think this is an issue here is because the Hann function is attempting to taper the wave to zero, but because zero is actually the trough of the wave the mean ends up being positive still. So when one tries to apply a zero mean (normalize filter with DC offset correction) the whole waveform shifts down so that the mean is zero, but the waveform is shaped like a positive hump, or hill, so that type of simple correction doesn't work correctly and part of the wave is still offset for each note.
Here I've taken a screenshot of the Hann function on the top, the output of the stable version of the program in the middle, and a waveform with no DC offset on the bottom. You can see on the scale on the left that the rest position is offset by the wave height for the first two.
Now, I'm pretty sure that this is happening because when you write out each note you somehow only end up with positive numbers for the variable that stores the waveform, I have stared at the gen.zig Generator function for longer than I care to admit and I cant figure it out, but this type of programming, as well as this language specifically, is not my area of expertise.
I hope you don't mind that this got a little long. I had a lot of fun researching all this, and I'd really like to keep trying to understand how to improve it. Thanks for reading! =] . And no worries if you have other priorities and don't want to continue diving into this.
EDIT: To clarify, the speaker example is oversimplified. I think most amplifiers don't reproduce a DC offset like this, it gets filtered out somewhere in the signal path. The effectiveness of that filtering and how things sound after it depend on the amplifier. It turns out that my DAC+Amp that I use for headphones on my desk is horrifically bad at this and will even shut off if I play these files too loud.
So when played over a speaker the speaker is forced outwards at all times, effectively moving the zero position to half the wave height, and requiring the amplifier to hold the speaker in an outward position at all times, never passing the rest position.
That's super interesting. I always thought there is something wrong with the generated file when I play it on a speaker but was not able to pinpoint it. Maybe I felt that happening somehow 🤔
Now, I'm pretty sure that this is happening because when you write out each note you somehow only end up with positive numbers for the variable that stores the waveform, I have stared at the gen.zig Generator function for longer than I care to admit and I cant figure it out, but this type of programming, as well as this language specifically, is not my area of expertise.
Yes, IIRC we always end up with positive values and encode them as notes using the equal temperament. The generate
function in gen.zig
definitely needs more explanation and I'm happy to walk you through the code if you have any specific questions. I like the rabbit hole you're digging there :)
I hope you don't mind that this got a little long. I had a lot of fun researching all this, and I'd really like to keep trying to understand how to improve it. Thanks for reading! =] . And no worries if you have other priorities and don't want to continue diving into this.
No worries at all! I hope you don't mind my disappearances on different intervals though.
Btw do you have a blog? I would love to read more about it if you go ahead and put up a deep dive article there. Maybe something in this format.
As for fixing this issue, I think we need to dive a bit more into the generate function and figure out what's happening in there I guess.
Don't mind the professional artwork.
If the program knew where a note transition would be placed slightly ahead of time, a short low-pass filter with a high cutoff frequency (only filtering the tick) could soften the wave transition. Ideally, the cutoff freq should lower (or attenuation should rise), come to a min. at note transition (or attenuation should come to a max.), and rise again (or attenuation should lower) until it's the original waveform again, all within some milliseconds.
I have no idea in a bloody spiraling hell how one would implement this, however... and this feels more like some mathematical perversion more than problem solving. But! Computational cost for this shouldn't be high (but don't quote me on that) as there are digital audio workstation software that can process several low-pass filters in near-real-time.
Describe the bug
Definitely not a big deal, but figured I'd report it since I noticed it. There is an audible "tick" is produced at the beginning (or end maybe?) of each note. You can see it on a spectrogram but also hear it depending on the pitch, distortion, etc.
To reproduce
linuxwave -n 27 -o bass.wav
ffmpeg -i bass.wav -lavfi showspectrumpic=s=1920x1080:mode=separate spectrogram.png
Expected behavior
A smooth transition between tones or notes
Screenshots / Logs
Software information
extra
Additional context
This is a totally awesome project! Thank you for making it!
It's possible this is unique to my machine. I have only tested on one device.