librespot-org / librespot

Open Source Spotify client library
MIT License
4.52k stars 545 forks source link

Add Resampling Support #1180

Open JasonLG1979 opened 1 year ago

JasonLG1979 commented 1 year ago

This PR adds resampling to 48kHz, 88.2kHz, and 96kHz with Windowed Sinc Interpolation.

It also moves everything except decoding out of player and a few other misc improvements like improving thread creation in player, using #[default] in the player config enums where we can, and includes the sample pipeline latency in calculating an accurate position.

JasonLG1979 commented 1 year ago

@eladyn or @roderickvd or anyone else really, this could use a review. I know it's a lot of code, I tried my best to split it up into a few commits to make it comprehensible.

roderickvd commented 1 year ago

Appreciate the work! I donโ€™t have a lot of time and as you know I donโ€™t use this anymore myself, so I hope others will take a look before I get to it.

JasonLG1979 commented 1 year ago

@roderickvd, who else is active and has commit rights?

JasonLG1979 commented 1 year ago

For testing purposes the new options are:

        --interpolation-quality QUALITY
                        Interpolation Quality to use if Resampling
                        {Low|Medium|High}. Defaults to Low.
        --sample-rate SAMPLERATE
                        Sample Rate to Resample to
                        {44.1kHz|48kHz|88.2kHz|96kHz}. Defaults to 44.1kHz
                        meaning no resampling.

Where --interpolation-quality Low uses Linear Interpolation, which is fast but not great as far as quality, and Medium|High use Windowed Sinc Interpolation which is much better then Linear but it cost more CPU cycles.

A sample rate of 44.1kHz (the default) bypasses resampling.

JasonLG1979 commented 1 year ago

I finally found a bit of time to have a look at this, but I'll probably need a bit more to thoroughly go through all files (especially things like alsa or the resampler. I hope, I will get to it this weekend, but not sure yet.

I commented on some minor things that I noticed, but these are mainly technical things. On the audio side I'm not that experienced so I won't have much to add, but I'll give the various new options a try, when I get to it.

ALSA is my baby. I wouldn't do her wrong,lol!!!

The resampler is implemented to basically mimic a simple FIR filter in operation for the Windowed Sinc version, in that it uses temporal convolution to interpolate. It's non-oversampling to keep the resource usage to a minimum and dual threaded so that right and left channels are processed at the same time.

Good-enough quality and efficiency was the target. Are there better quality resamplers? Sure. Are there other resamplers that are this balanced? I'm not sure. You can resample to 96kHz at the highest setting on a Pi Zero with this reampler and use less than 20-ish % system resources.

JasonLG1979 commented 1 year ago

As far as latency is concerned by my math, depending on sample rate and interpolation quality, it maxes out at any where from 1.5ms to 8.4 ms. But in normal operation the latency is variable because it tries to process the largest chunk of samples it can.

JasonLG1979 commented 1 year ago

Here is a screenshot of htop resampling to 48kHz High on a Pi Zero v2.

What you have to keep in mind is the CPU %'s next to the threads represent % of 1 core.

The Avg bar is the actual total system load.

As you can see, headroom for days.

Screenshot from 2023-06-26 19-27-30

Forgot to mention, aside from installing a Raspotify package based on this branch it's a fresh install of headless Raspberry Pi OS with no tweaks.

JasonLG1979 commented 1 year ago

I'm still getting:

[2023-06-27T02:41:33Z ERROR librespot_playback::player] Unable to load audio item: Error { kind: Unavailable, error: Response(MercuryResponse { uri: "hm://keymaster/token/authenticated?scope=playlist-read&client_id=65b708073fc0480ea92a077233ca87bd&device_id=179116ac794f5870d6c936120d87ce0370052b84", status_code: 403, payload: [[123, 34, 99, 111, 100, 101, 34, 58, 52, 44, 34, 101, 114, 114, 111, 114, 68, 101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 34, 58, 34, 73, 110, 118, 97, 108, 105, 100, 32, 114, 101, 113, 117, 101, 115, 116, 34, 125]] }) }

When I try to use discovery. Librespot works fine if I provide my creds. I wouldn't mind fixing this also as it's the blocker that prevents me from switching to dev in Raspotify. Otherwise dev works rather well.

roderickvd commented 1 year ago

The portaudio, jackaudio and rodio backends need tested and probably fixed. I'm not really at all familiar with them. The ALSA and PulseAudio backends are solid. Gstreamer was super straightforward as far as a non-const sample rate and for pipe and subprocess sample rate doesn't really matter at all.

When introducing common functionality I recommend to make sure that it works with all backends. Fortunately PortAudio and Rodio are really easy to test. By the way, Rodio does its own interpolation so it should not even care.

Jack is also not so difficult to test when you find one of those binaries that come complete with a GUI to quickly launch a server and graphically construct a pipeline. Like QjackCtl.

I'm not sure if librespot should have resampling, but since I don't use it anymore, let's follow the voice of the people. And at this point you're the only two people responding, so ๐Ÿ˜†

I'm still getting:

[2023-06-27T02:41:33Z ERROR librespot_playback::player] Unable to load audio item: Error { kind: Unavailable, error: Response(MercuryResponse { uri: "hm://keymaster/token/authenticated?scope=playlist-read&client_id=65b708073fc0480ea92a077233ca87bd&device_id=179116ac794f5870d6c936120d87ce0370052b84", status_code: 403, payload: [[123, 34, 99, 111, 100, 101, 34, 58, 52, 44, 34, 101, 114, 114, 111, 114, 68, 101, 115, 99, 114, 105, 112, 116, 105, 111, 110, 34, 58, 34, 73, 110, 118, 97, 108, 105, 100, 32, 114, 101, 113, 117, 101, 115, 116, 34, 125]] }) }

When I try to use discovery. Librespot works fine if I provide my creds. I wouldn't mind fixing this also as it's the blocker that prevents me from switching to dev in Raspotify. Otherwise dev works rather well.

You can read the error message by converting the payload array from decimal to ASCII. It then reads:

{"code":4,"errorDescription":"Invalid request"}

As I commented with the issues, see if it helps if you simplify that keymaster request by taking out some of the parameters like client_id and/or device_id. Maybe when you use discovery, either of those IDs are not properly initialised yet.

JasonLG1979 commented 1 year ago

When introducing common functionality I recommend to make sure that it works with all backends. Fortunately PortAudio and Rodio are really easy to test. By the way, Rodio does its own interpolation so it should not even care.

Everything should work now.

Jack is also not so difficult to test when you find one of those binaries that come complete with a GUI to quickly launch a server and graphically construct a pipeline. Like QjackCtl.

Jack conflicts with pipewire. It's basically impossible to test without completely borking the audio on your system. But it doesn't take a sample rate in it's initialization anyway so I assumed it's up to the user to make sure the sample rates jive?

I'm not sure if librespot should have resampling, but since I don't use it anymore, let's follow the voice of the people. And at this point you're the only two people responding, so laughing

It's a pretty common request as basically users expect all audio players these days to be able to do at least some sort of basic resampling.

JasonLG1979 commented 1 year ago

You can read the error message by converting the payload array from decimal to ASCII. It then reads:

{"code":4,"errorDescription":"Invalid request"}

That is a super useless error message. Way to go Spotify,lol!!!

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

roderickvd commented 1 year ago

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

Sure do it. One tip, forget about trying to deserialize the JSON. I tried once but there are so many variants that Spotify puts out, that I basically reverted many hours of work simply because I was running into walls everywhere. Maybe I got it totally wrong then but it was a pain in the butt.

JasonLG1979 commented 1 year ago

Maybe we can convert that in librespot to present the actual error message (useless as it is anyway)?

Sure do it. One tip, forget about trying to deserialize the JSON. I tried once but there are so many variants that Spotify puts out, that I basically reverted many hours of work simply because I was running into walls everywhere. Maybe I got it totally wrong then but it was a pain in the butt.

I was more thinking just run converting the response to ASCII when it's an error so we get an error message instead of an array of numbers. The error messages don't seem to be that useful so far but at least if they're converted they won't fill up my terminal.

JasonLG1979 commented 1 year ago

Ok @eladyn and @roderickvd, hit me with another round or reviews.

roderickvd commented 1 year ago

Does your resampler have an anti-aliasing filter?

JasonLG1979 commented 1 year ago

Does your resampler have an anti-aliasing filter?

It only upsamples and it doesn't oversample. No new high freq content is created, and everything from the source fits into the target. Anti-aliasing is not needed. And in the case of windowed sinc interpolation the sinc window is a natural anti-aliasing filter anyway.

That's one of those "givens" I was talking about. I don't have to deal with arbitrary input. I know that my input is 44.1kHz. I know I will only be upsampling.

Oversampling could potentially improve the quality of the interpolation at the cost of increased code complexity and A LOT more CPU usage. And in that case an additional anti-aliasing step would be required for linear interpolation but still not for windowed sinc though as I mentioned the sic window is a natural anti-aliasing filter.

Being non-oversampling was a design choice. My goal was good enough quality and low-ish CPU usage. I was looking for the best "bang for the buck".

JasonLG1979 commented 1 year ago

I played with a version that incorporated oversampling but to my non-golden ears it made no difference besides eating twice as much CPU.

JasonLG1979 commented 1 year ago

The 1st rule of resampling is; don't unless you have to. Under no circumstances will it improve sound quality over the original. There are only 2 real reasons to resample.

  1. Your device doesn't support 44.1kHz.
  2. You're doing some DSP that requires it.

This resampler is designed for case 1. For case 2 the user should let the DSP handle resampling since it should know best how to handle it. For example, my advice for a PulseAudio Sink user would be to set the format to F32 and let PulseAudio handle resampling.

This resampler best serves backends that either don't have resampling or that have resampling that is too CPU intensive and thus can be bypassed.

JasonLG1979 commented 1 year ago

For users of the ALSA backend for example this resampler will absolutely smoke libsamplerate. I can't objectively measure the sound quality difference but subjectively I can't hear a difference.

As far as resource utilization is concerned it's no contest. libsamplerate is single threaded and horribly inefficient. And the way it's designed it would take a rewrite to make it any better in that regard. It will cripple a lower power device. (I did a deep dive in the source code when writing this resampler.) It for example calculates the interpolation coefficients on the fly.

roderickvd commented 1 year ago

I have not studied your resampling algorithm. Could you clarify a few points?

No new high freq content is created, and everything from the source fits into the target. Anti-aliasing is not needed.

Are you sure? Fitting 44.1 kHz into 192 kHz with just the DAC filtering at around 96 kHz will allow all 22.05 - 44.1 kHz mirror images through.

And in the case of windowed sinc interpolation the sinc window is a natural anti-aliasing filter anyway.

I agree that a windowed sinc function should take care of most of it. Couple of questions:

JasonLG1979 commented 1 year ago

Are you sure? Fitting 44.1 kHz into 192 kHz with just the DAC filtering at around 96 kHz will allow all 22.05 - 44.1 kHz mirror images through.

Working under the assumption that the source audio is anti-aliased properly there is nothing above 22.05 in the resampled output. Like I said no new high freq content is created. If the source is not properly anti-aliased then we should be FIR filtering ALL SAMPLE RATES even 44.1kHz.

I agree that a windowed sinc function should take care of most of it. Couple of questions:

Because you say "in the case of..." are there any other cases that do not use a windowed sinc function?

Linear Interpolation doesn't use a windowed sinc function.

What is the cut-off frequency? Is it hardcoded or does it depend on the input (nowadays only 44.1 kHz, just asking)

The sinc window cuts off at the Nyquist freq of the output sample rate naturally.

Since the output sample rate is higher then the input sample rate it doesn't cut off any of the input.

What is the kernel length? (to assess roll-off, attenuation and CPU load relative to libresample)

High is 257 Med is 129 Low is 0 (it doesn't use a Kernel, It's just plain old Linear).

There is no roll-off. There is no explicit cut off freq. To do that I would have to convolute FIR Filter coefficients into the interpolation coefficients. It's essentially a brick-wall at the Nyquist freq of the output sample rate. It's either there or it's not. But again it doesn't matter since even in the resample output there is nothing above 22.05.

Since no oversampling is done, no additional high freq content is created. If it were to oversimple then filtering would be required because oversampling shifts the Nyquist freq of the source before Interpolation.

JasonLG1979 commented 1 year ago

If it helps you sleep at night I can add a FIR filter that filters out everything about 22.05.

roderickvd commented 1 year ago

Should be fine then, thanks. You are right that the input should contain nothing above Fs/2.

Maybe talking out of my butt when I ask ... what would happen if you set the center frequency to 22.05 kHz just to make sure? Would that make any sense or even work? Or more fundamentally, why set Fc = Fs and not Fs/2?

You see I'm not a resampling specialist, just heard a thing or two ๐Ÿ˜„

Edit: I think windowed sinc rolls off by definition, depending on the number of coefficients in the kernel?

JasonLG1979 commented 1 year ago

It wouldn't hurt to add a FIR filter in to the interpolation coefficients just in case I am wrong. Calculating the interpolation coefficients is a one time cost so mixing in a FIR filter doesn't cost anything at runtime.

Edit: I think windowed sinc rolls off by definition, depending on the number of coefficients in the kernel?

Assuming you add FIR filter coefficients. Otherwise it's a brickwall. Well no matter what it's a brickwall you can just make it roll-off before it hits the brickwall if you want.

JasonLG1979 commented 1 year ago

Traditionally you'd roll-off at slightly below the Nyquist freq of the input so for 44.1kHz 20kHz.

JasonLG1979 commented 1 year ago

I've the best luck with a FIR filter with 3 taps using the Blackman-Harris window. (more taps steeper roll-off but more distortion with tight transition bandwidths like 48kHz where the transition bandwidth is only 4000Hz)

JasonLG1979 commented 1 year ago

I already wrote a stand-alone FIR Filter:

https://gist.github.com/JasonLG1979/c95b035ed271bfcbbe10a8047cf9e580

I'll just adapt it for our needs.

JasonLG1979 commented 1 year ago

@roderickvd, There I capped the output bandwidth to 92% which gives us anti-alias filtering without chopping off any of the source bandwidth since even at 48kHz 92% is just over the Nyquist freq of 44.1kHz.

JasonLG1979 commented 1 year ago

Calculate the actual roll-off angle and annotation is some pretty advanced math that I haven't cared to do (LMFAO!!!) but given that it's a Blackman Window the attenuation is going to be at least -58dB and given filter lengths of 129 and 257 the roll-off is going to be steep as Hell.

JasonLG1979 commented 1 year ago

As far as comparing to libsamplerate I want to say they use a window size of 300-something for their highest sinc setting? I found that 257 was about as high as I felt sounded good. (ofc that is subjective) the thing about window size is that it's a trade off between time resolution and freq resolution. Longer is not always better. Longer windows tend to "smear" the sound. You lose transient resolution.

Don't get me wrong libsamplerate is fine. The main reason this resampler is more efficient is that that there are so many known givens. What makes libsamplerate inefficient for the most part besides the single-threaded-ness and calculating coefficients on the fly (which are pretty big hits) is that it can't take anything for granted as a given. It's a matter of being able to optimize for our exact needs.

JasonLG1979 commented 1 year ago

Completely had a duh moment and realized that the resample_factor_reciprocal var that is passed into the function that creates the interpolation coefficients anyway also represents the min output bandwidth needed to fully express the input. So I'm just using that.

JasonLG1979 commented 1 year ago

@roderickvd Now Linear Interpolation is also Anti-aliased, not to near the degree as Windowed Sinc because with a bare FIR filter that many taps distorts, but a 5th order filter is more then enough I think.

roderickvd commented 1 year ago

Traditionally you'd roll-off at slightly below the Nyquist freq of the input so for 44.1kHz 20kHz.

Venturing in an area I am somewhat more versed, I think common values are: Passband at 0.4535 Fs Stopband at 0.5465 Fs

AKM devices typically have a slightly different transition band: 0.4583 - 0.5417 * Fs.

JasonLG1979 commented 1 year ago

Venturing in an area I am somewhat more versed, I think common values are: Passband at 0.4535 Fs Stopband at 0.5465 Fs

AKM devices typically have a slightly different transition band: 0.4583 - 0.5417 * Fs.

It will vary depending on the output sample rate. The bandwidth will be 44.1/output sample rate. So if you want to go by Nyquist freq as your "0.5" it's:

48kHz = 0.459375 88.2kHz = 0.25 96kHz = 0.2296875

If you multiply those back by the sample rates the cutoff is always 22050Hz so we never lose any of the input.

JasonLG1979 commented 1 year ago

That would translate to a transition band of:

48kHz = 0.040625, 0.459375 - 0.5-ish (22050Hz - 24000Hz) 88.2kHz = 0.25, 0.25 - 0.5-ish (22050Hz - 44100Hz) 96kHz = 0.2703125, 0.2296875 - 0.5-ish (22050Hz - 48000Hz)

As you can see 48kHz is pretty tight but the rest of the sample rates have plenty of room, but that's the nature of low-pass filters. The higher the sampling rate the easier the filtering.

JasonLG1979 commented 1 year ago

The consequence/bonus of my last commit is that the code is a little more clear and organized. I broke out the calculations of the window function and fir filter it should be easier to figure out what's going on.

JasonLG1979 commented 1 year ago

Ok @roderickvd I've managed to generate anti-alias filters that are @ -0.5dB at 20kHz and @ -96dB at 22.05kHz using https://github.com/chipmuenk/pyfda

image

The only catch is that there would end up being a 1200 line filter_coefficients.rs that contains all of coefficients as arrays.

It does sound damn good though.

JasonLG1979 commented 1 year ago

Oh, I'm using a Kaiser Window with a beta of 8.6 so the lobe approximates a Blackman Window.

JasonLG1979 commented 1 year ago

I changed the filters a bit. I moved the fc up so that it won't cutoff any of the source.

That will gives us -86dB before 23kHz.

roderickvd commented 1 year ago

Do you have a graph? I think your previous one is more SOTA. Slight roll-off just around or before 20 kHz is very common in industry as you can see by the transition band figures I posted earlier.

JasonLG1979 commented 1 year ago

How's this? The fc (-6dB point) is 20.2 - 20.5kHz, depending on the sample rate to get roughly the same curve. It's flat to about 19.8kHz, -0.5 at 20kHz, and hits -87dB well before 22kHz. That's about the best I can get and still maintain a lobe width that's close to a Blackman Window, still get damn close to 20kHz and be at max attenuation well before 22.05kHz. neg1 neg6 neg100

JasonLG1979 commented 1 year ago

neg1 neg6 neg100 Well scratch that. I've been messing around with different fc's and if you set them below 22.05kHz it's sounds "crunchy". If I had to guess it's probability because the filter is so steep that that it's distorting the very, very high end. I've found the best so far is a fc of 22.5kHz. That still gives you 0.46875 at 48kHz, 0.255102041 at 88.2kHz, and 0.234375 at 96kHz and -86dB well before 23kHz so it should provide acquitted anti-alias filtering for all sampling rates considering most of the time the target is -60dB.

"Crunchy" sounding has often been by complaint for some resamplers. Makes me wonder now if it's from overzealous anti-aliasing?

roderickvd commented 1 year ago

So is that back to the first one you posted? Like you said then, you thought it sounded great, and objectively I think it looks like the best of them.

JasonLG1979 commented 1 year ago

So is that back to the first one you posted? Like you said then, you thought it sounded great, and objectively I think it looks like the best of them.

No I have a better one. This one still hits your targets but it hits -135dB at around 23kHz. I also lengthened the window and narrowed the main lobe width to improve interpolation quality. It cost a tiny bit more CPU cycles by I think it's worth it. Sorry this is taking a lot of iterations to get right but it's really trial and error.

neg150 neg6 neg1

JasonLG1979 commented 1 year ago

I think -135dB is good enough. If I don't stop myself somewhere I'll be tweaking on this forever,lol!!!

JasonLG1979 commented 1 year ago

@roderickvd what I just pushed will get us flat until just over 20kHz and about -195dB before 22kHz. And since -195dB is below the noise floor of even S32 any aliasing artifacts should be buried in the noise floor/quantization noise. neg220

roderickvd commented 1 year ago

And it sounds good too? ๐Ÿ˜„ Let me me know if you are good to merge.

JasonLG1979 commented 1 year ago

I'm still fine tuning. Since I don't have any way to objectively test anything and I can't really can't find any solid advice online specifically for interpolation kernels I'm left to do it by ear. I can find things specific to FIR filters but that advice does not necessary apply. I've been doing my testing in another branch so as not to spam the CI here. I'll let you know when I'm sure it's ready and even then I'd like at least one other person to listen to it to get their impression before it's merged.

roderickvd commented 11 months ago

Any updates?

roderickvd commented 8 months ago

bump any updates?