mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
28.1k stars 2.88k forks source link

Scaletempo2, the new default for adjusting audio playback speed, sounds noticeably worse in some situations #8705

Open varenc opened 3 years ago

varenc commented 3 years ago

Important Information

Provide following Information:

This is relevant because scaletempo2 was changed to the default from scaletempo in #8376

Reproduction steps

Try listening to some 5.1 audio at 0.95x speed using the now default scaletempo2 filter.
$ mpv --speed=0.95 --af=scaletempo2 some_audio Listen for the poor quality in some situations.

Now add the old default, scaletempo, to the af filter chain and listen for the better quality. $ mpv --speed=0.95 --af=scaletempo some_audio

Also listen to the recorded sample files below. You can use the original_source.mkv file included to reproduce the samples I recorded.

Expected behavior

The default should not make things worse.

Actual behavior

scaletempo2 is worse for minor speed changes in the 0.85x - 1.2x range. It's MUCH better for the big speed changes though, and I really appreciate it for that.

While I do appreciate scaletempo2 for big adjustments, I usually only make minor speed changes so for me so it's not a good default. I suspect that playback speed adjustments in the 0.9-1.2x range are much more common amongst users. This comment is where another users seems to have been caught up in this default change.

Sample files

Hrxn commented 3 years ago

I suggest voting with +1 and -1 on the original post to vote on changing the default. +1 to vote for changing the default back to scaletempo (unless there is some fix) -1 to vote for keeping the current default.

Edit: yes, can reproduce

CounterPillow commented 3 years ago

Suggestion: scaletempo3, which uses scaletempo for speeds between 0.80 to 1.3 and scaletempo2 for speeds outside of that range.

TiGR commented 3 years ago

Or maybe have it configurable separately as we have it with scale algorithms.

varenc commented 3 years ago

The best solution would just be to make scaletempo2 work better even at minor speed changes! Chrome's own audio scaling, which scaletempo2 is a port of, seems to work fine with minor adjustments.

@DorianRudolph, perhaps you might have an idea of why scaletempo2 performs worse than Chrome does at 0.95x speed? Is there any hope for just tweaking it to handle this use case? That would of course be the ideal solution!

My thinking is that if scaletempo is going to be restored as the default, that should happen soon to avoid further confusion for people. Also I'm basing this on the assuming that minor speed changes in the 0.85x - 1.2x range are far more common amongst MPV users, like they are for me, though I'm not sure if that's true. No matter the outcome, I'll also submit a PR for a docs update which adds a section explaining to users how to easily change the default to another audio scaler.

(@TiGR I do think the how mpv lets you choose your "audio scaling" filter is a bit idiosyncratic and hard to discover, but I think that's for a different discussion!)

DanOscarsson commented 3 years ago

If it works fine in Chrome it may be because some versions of their code switched to resampling between speed 0.95 - 1.06. Personally I prefer to use resampling when close to normal speed, like when playing a 25 Hz movie on a 24 Hz display. And mpv can do sync to vsync with resampling and be configured to do that so 25 Hz movies are automatically resampled to 24 Hz. My only need for preserving pitch is when playing at a fast speed like > 1.5. And that is what I would have expected most users need scaletempo2/scaletempo for. But apparently that may not be true but cannot be determined without asking a lot of users.

As I have started working on some fixes to scaletempo2 (not related to speed near 1) it would be good to quickly decide which scaletempo version to use (there is one more atempo in ffmpeg) as maintaining several WSOLA implementations will just be confusing for users and additional work for maintainers. But may be needed if one cannot solve all users needs.

realnc commented 3 years ago

When I built mpv from git, the first thing I noticed were rather severe audio glitches when listening to audiobooks at speeds 0.9 and 1.1 (depending on whether the narrator is too fast or too slow.) It sounds like a scratched CD where the CD player is skipping.

scaletempo produces perfect results at these playback speeds. You can't even tell the sound is slowed down or sped up. It really sounds like the narrator is just reading slower or faster.

There doesn't seem to be an option to tell mpv which filter to use, so I had to put af-add=scaletempo in my config. Unfortunately, this disabled mpv's automatic filter removal when the filter is not needed. The filter is always active and shows up in the OSD all the time.

Something like an --audio-speed-filter option would be very nice to have instead of hardcoding scaletempo2 in the mpv source code.

garoto commented 3 years ago
[    no-osd af add "@tempo:scaletempo" ; no-osd add speed "-0.1"
]    no-osd af add "@tempo:scaletempo" ; no-osd add speed "+0.1"
BS   no-osd af remove @tempo ; no-osd set speed 1.0
realnc commented 3 years ago
[  no-osd af add "@tempo:scaletempo" ; no-osd add speed "-0.1"
]  no-osd af add "@tempo:scaletempo" ; no-osd add speed "+0.1"
BS     no-osd af remove @tempo ; no-osd set speed 1.0

I can't see what speed I'm setting.

avih commented 3 years ago

@realnc please file a new issue, with logs and everything else which the template requests

If you can bisect it to find the exact first commit where the issue happens - it would great info to add.

avih commented 3 years ago

It sounds like a scratched CD where the CD player is skipping

@realnc could you please open a new issue for this? All the reports we have so far are about subjective quality, but what you're describing is new, and could very well be an actual bug - which none of us is able to reproduce.

So please file a new issue, with logs, preferably sample files, bisect if you can, etc. It would help us identify a yet-unknown bug.

kevin-stuart commented 2 years ago

I will be not too helpful commenting here, but I just want to confirm this report.

I upgraded do 0.34 and wondered why voices sound robotic at speed 1.1 until I figured out that apparrently the default was changed to scaletempo2. I added af=scaletempo as option in mpv 0.34 and aparrently things went back to normal.

Unfortunately, I can't offer any samples and it may subjective, but to me it was clear as day that something had changed and voices sounded very robotic with a lot of videos (but not all!). There seem to be some exceptions, but for me, scaletempo2 is way worse.

At least please don't remove scaletempo, for me scaletempo2 is very hard to bear for many files. I can try to see if I notice some kind of regularity such as audio codecs, but for me, there is something very wrong with scaletempo2.

richardpl commented 2 years ago

Use atempo instead.

kevin-stuart commented 2 years ago

I tried atempo. It sounds similar to scaletempo2 to me (i.e. robotic). It is also not documented in the mpv manual, so I did not get the idea to use this ffmpeg filter. For me scaletempo sounds best. Is it possible that there is some kind of bug in mpv that makes scaletempo2 or atempo sound much worse for only some people?

varenc commented 2 years ago

@kevin-stuart I don't think there's any reason why the exact same media played with the exact same version of MPV would result in any difference in sound between people. That said, I opened this issue because I observed that 6 channel audio with scaletempo2 seemed to give worse results than scaletempo when there's a very minor speed adjustment. But the issue went away with most stereo audio. I suspect you're experiencing the same issue. If you can post a small sample that'll help people confirm.

Also I agree that atempo also performs well, but atempo isn't fully supported by mpv and it will eventually lead to an out of sync audio and video. But if you're just playing audio you might not care. I described the atempo issue and some very janky workarounds here: https://github.com/mpv-player/mpv/issues/4418#issuecomment-643099263 For me, scaletempo2 removes my need for atempo.

Given how long scaletempo2 has been the default at this point, unless a lot more people find this issue and concur, I think leaving it the default will be the least disruptive for the most folks. In the meantime just making it easy to switch back to scaletempo is an easy solution. Maybe adding that to the default input.conf to could help. (though tough to decide on the key)

(I use $ af toggle scaletempo in my input.conf to make the $ key toggle it)

kevin-stuart commented 2 years ago

You are right, I observed my problems with scaletempo2 with 6 channel audio. I mainly use 1.1 as speedup and scaletempo2 and atempo sound bad for me with this setup. I have set scaletempo in my config. I just hope that scaletempo2 is improved in the future and that scaletempo is not removed until then. For me, scaletempo2 became the new default only very recently when I upgraded to 0.34

dardoor commented 2 years ago

I also noticed occasionally very bad sound with scaletempo2. Here's an example from a movie with 2 channel audio, comparing scaletempo and scaletempo2 at 1.1x and 1.21x speeds: scaletempo mpv test.zip

christoph-heinrich commented 2 years ago

You might want to try out --af=scaletempo2=search-interval=50:window-size=40. I've tried the example from @dardoor (original (1x).opus) and it sounds great at various speeds (>1).

realnc commented 2 years ago

You might want to try out --af=scaletempo2=search-interval=50:window-size=40. I've tried the example from @dardoor (original (1x).opus) and it sounds great at various speeds.

It sounds horrible to me with speech with a speed of 0.94. Some words sound robotic, metallic and choppy.As a quick test, I was listening to this podcast:

https://www.youtube.com/watch?v=cnFubyqJ3Ro

Prime example is at the very beginning (0:0:45s) where he says "that the community left for us". If you set the speed to 0.94, scaletempo2 is attrocious. scaletempo is perfect.

Whether I use your paremeters or not doesn't change anything for me in this regard.

christoph-heinrich commented 2 years ago

mpv --no-config --start=44 --speed=0.94 --af=<filter> 'https://www.youtube.com/watch?v=cnFubyqJ3Ro' I don't hear a problem with scaletempo2, but maybe I'm so used to it that I don't even notice it anymore. test.zip

Admittedly I never actually listen to anything at <1 speed, so maybe I would have noticed something at some point if I did. (videos are always >=1.25 speed for me, but I also tested with smaller values >=1)

dardoor commented 1 year ago

scaletempo2=search-interval=50:window-size=40 does sound good on the sample I posted, at 1.1 and 1.2 speeds, even a bit better than scaletempo, I think.

But it sounds bad on that last sample at 0.94, at least the "basically we" part. scaletempo2 with no parameters sounds better, and scaletempo even better.

(I also mostly play media at faster speeds and I would guess that's true for most people too.)

mars4science commented 1 year ago

Interestingly: After I've changed scaletempo2 to scaletempo in f_auto_filters.c p->sub.filter = mp_create_user_filter(f, MP_OUTPUT_CHAIN_AUDIO, "scaletempo", NULL);

--af=scaletempo=speed= none, both and tempo sound about the same - like I expect tempo to sound. af=scaletempo=speed=pitch works as expected. But when I've commented out that line sound was played at 1x speed regardless of video speed. Seems none and both values to option speed do not work as expected from man page.

    both
        Scale both tempo and pitch.
    none
        Ignore speed changes.
llyyr commented 1 year ago

Is this issue still valid on builds from current master? Also please try rubberband from #12479 build

StrangePeanut commented 11 months ago

Still relevant for current builds. af=scaletempo considerably improves audio at 1.2x.

christoph-heinrich commented 11 months ago

Still relevant for current builds. af=scaletempo considerably improves audio at 1.2x.

Do you have an example?

raziel711 commented 6 months ago

Do you have an example?

@christoph-heinrich, I think I have a sample where the audio is distorted at 1.1x.

sample.zip

At least on my computer, I can hear noticeable distortions with the audio where the voices sound robotic, particularly at 00:22 with the line "... if you have to die to get it..." as well as at 00:40 with the line "...are going to attack..."

Increasing the speed to 1.2x makes the distortion less severe, but I can still hear it. When the speed is > 1.3x, the distortion seems to no longer be present.

When using af=scaletempo, there is no issue at any speed above 1.

christoph-heinrich commented 6 months ago

@raziel711 You're right, scaletempo sounds much better then scaletempo2 at 1.1x speed on that sample. However scaletempo2 sounds better then scaletempo at 2x speed (both aren't perfect though).

Because they don't use the same metric for finding a suitable overlap position, there will always be edgecases where one works better then the other, however I've done a lot of testing comparing both with the same parameters (for https://github.com/mpv-player/mpv/pull/12487) and scaletempo2 is generally significantly better then scaletempo.

You can try playing with the parameters of each to see if you find ones that better suite your needs. Keep in mind that what scaletempo2 calls window-size is stride * overlap for scaletempo, in case you want to compare them. The reasoning behind the current defaults of scaletempo2 can be found in https://github.com/mpv-player/mpv/pull/12580

christoph-heinrich commented 6 months ago

I ran into a file that had bad results with scaletempo2 (worse then the sample above) and noticed it had 6 audio channels, the same as the sample from @raziel711. Then I used channelmap=map=2-FL|2-FR to get the voices only and that sounded great, both on that sample and on my file. Finally I tried using scaletempo=overlap=0.5:search=40:stride=24 with my changes from #12487 and that also sounded good.

Looks to me like scaletempo2 has a problem with 6 channels for some reason (or probably anything >2), which sounds like a bug to me. I won't be able to have a look at the code this weekend, but maybe @ferreum has any ideas about what might be the cause?

Edit: I had a few minutes and didn't notice anything obvious in the code, but I reverted all changes to af_scaletempo2_internals.c that were made since it was introduced, and the problem still exists there.

christoph-heinrich commented 6 months ago

Replacing the similarity measure with what I'm using in #12487 sounds a lot better, which suggests that somewhere in the calculation of that channels aren't handled correctly, but I couldn't find that mistake so far.

Replacement diff ```diff diff --git a/audio/filter/af_scaletempo2_internals.c b/audio/filter/af_scaletempo2_internals.c index 534f4f672a..a41a71828f 100644 --- a/audio/filter/af_scaletempo2_internals.c +++ b/audio/filter/af_scaletempo2_internals.c @@ -93,17 +93,19 @@ static void multi_channel_moving_block_energies( } static float multi_channel_similarity_measure( - const float* dot_prod_a_b, - const float* energy_a, const float* energy_b, - int channels) -{ - const float epsilon = 1e-12f; - float similarity_measure = 0.0f; - for (int n = 0; n < channels; ++n) { - similarity_measure += dot_prod_a_b[n] - / sqrtf(energy_a[n] * energy_b[n] + epsilon); + float **a, int frame_offset_a, + float **b, int frame_offset_b, + int channels, + int num_frames) +{ + float distance = 0; + for (int c = 0; c < channels ; c++) { + float *source = b[c]; + float *target = a[c]; + for (int i = 0; i < num_frames; i++) + distance += fabs(target[i + frame_offset_a] - source[frame_offset_b + i]); } - return similarity_measure; + return -distance; } #if HAVE_VECTOR @@ -229,18 +231,14 @@ static int decimated_search( const float *energy_target_block, const float *energy_candidate_blocks) { int num_candidate_blocks = search_segment_frames - (target_block_frames - 1); - float dot_prod [MP_NUM_CHANNELS]; float similarity[3]; // Three elements for cubic interpolation. int n = 0; - multi_channel_dot_product( + similarity[0] = multi_channel_similarity_measure( target_block, 0, search_segment, n, channels, - target_block_frames, dot_prod); - similarity[0] = multi_channel_similarity_measure( - dot_prod, energy_target_block, - &energy_candidate_blocks[n * channels], channels); + target_block_frames); // Set the starting point as optimal point. float best_similarity = similarity[0]; @@ -251,14 +249,11 @@ static int decimated_search( return 0; } - multi_channel_dot_product( + similarity[1] = multi_channel_similarity_measure( target_block, 0, search_segment, n, channels, - target_block_frames, dot_prod); - similarity[1] = multi_channel_similarity_measure( - dot_prod, energy_target_block, - &energy_candidate_blocks[n * channels], channels); + target_block_frames); n += decimation; if (n >= num_candidate_blocks) { @@ -268,15 +263,11 @@ static int decimated_search( } for (; n < num_candidate_blocks; n += decimation) { - multi_channel_dot_product( + similarity[2] = multi_channel_similarity_measure( target_block, 0, search_segment, n, channels, - target_block_frames, dot_prod); - - similarity[2] = multi_channel_similarity_measure( - dot_prod, energy_target_block, - &energy_candidate_blocks[n * channels], channels); + target_block_frames); if ((similarity[1] > similarity[0] && similarity[1] >= similarity[2]) || (similarity[1] >= similarity[0] && similarity[1] > similarity[2])) @@ -323,7 +314,6 @@ static int full_search( const float* energy_candidate_blocks) { // int block_size = target_block->frames; - float dot_prod [sizeof(float) * MP_NUM_CHANNELS]; float best_similarity = -FLT_MAX;//FLT_MIN; int optimal_index = 0; @@ -332,12 +322,10 @@ static int full_search( if (in_interval(n, exclude_interval)) { continue; } - multi_channel_dot_product(target_block, 0, search_block, n, channels, - target_block_frames, dot_prod); float similarity = multi_channel_similarity_measure( - dot_prod, energy_target_block, - &energy_candidate_blocks[n * channels], channels); + target_block, 0, search_block, n, channels, + target_block_frames); if (similarity > best_similarity) { best_similarity = similarity; ```
christoph-heinrich commented 6 months ago

I think the problem is the stuff with energies. I don't know why they screw things up, and I wasn't able to find any mistakes in their calculation, but simply removing them makes things sound way better.

diff --git a/audio/filter/af_scaletempo2_internals.c b/audio/filter/af_scaletempo2_internals.c
index 534f4f672a..ee78940ba1 100644
--- a/audio/filter/af_scaletempo2_internals.c
+++ b/audio/filter/af_scaletempo2_internals.c
@@ -100,8 +100,7 @@ static float multi_channel_similarity_measure(
     const float epsilon = 1e-12f;
     float similarity_measure = 0.0f;
     for (int n = 0; n < channels; ++n) {
-        similarity_measure += dot_prod_a_b[n]
-            / sqrtf(energy_a[n] * energy_b[n] + epsilon);
+        similarity_measure += dot_prod_a_b[n];
     }
     return similarity_measure;
 }

However there is no way the chromium devs went through the effort of doing that energy stuff if it didn't create better results, but I've been looking for hours and can't find any mistakes.

Test it and if enough people agree it's better without energy, then we can remove that.

CounterPillow commented 6 months ago

However there is no way the chromium devs went through the effort of doing that energy stuff if it didn't create better results, but I've been looking for hours and can't find any mistakes.

Removing the energy calculation seemingly bypasses a large part of the algorithm (see decimated_search), namely energy_target_block and &energy_candidate_blocks[n * channels] passed to the similarity measure become pointless. At that point I'd be more inclined to believe something is doing a whoopsie with those two values than that Chromium devs wrote a whole lot of complicated code for negative benefit.

I don't think it's a numerical precision issue, when I looked at the values in that term (and what the standard says about fsqrt's precision) it seemed fine.

I've printf'd some values along the way and they're not obscenely huge, but plotting the difference between the similarity_measure result with energy and the one without yields something I guess:

This seems not super out of whack and not biased to specifically one side so if this is a bug in the implementation rather than a bad design of the algorithm then it'll be a pain to find. Maybe I should repeat this with each channel isolated (though adding an --audio-channels=mono doesn't seem to affect the robotic-ness at all).

EDIT: And here's the absolute difference for the entire sample. It'd probably be more meaningful as like a fraction of either of the values, though by having seen quite a few numbers in my time I can tell it is actually fairly big:

CounterPillow commented 6 months ago

Looks like the energies for channel number 3 are the culprit:

similarity_measure += dot_prod_a_b[n] / sqrtf((n == 2 ? 0.0f : energy_a[n]) * (n == 2 ? 0.0f : energy_b[n]) + epsilon);

Ignoring it in the similarity measure (by setting it to 0, lol) fixes the sample. Obviously this is not a fix, so the question becomes: why is channel 3's energies in both a and b so out of wack?

EDIT: Nevermind that's the dialog channel, of course disabling the energies for that fixes it. D'oh.

ferreum commented 6 months ago

First thing coming to mind here is a problem when there are many channels that are very quiet. With stereo the channels are usually very correlated, so the problem is probably rare with 2 channels.

My thought is that quiet channels may look very damn similar ~to each other~ at any two points, thus resulting in a bad overlap position to be selected, even though audible parts are in different channels. I wonder how the energies currently come into play with this (that's where my understanding of the algorithm ends).

It looks to me like there is bad weighting for multiple channels--louder channels should have the strongest impact on overlap position.

Edit: Corretion that channels look similar stimilar at any two points they are compared, not to each other. Channels aren't compared to each other I think.

ion1 commented 6 months ago

FWIW, this MIT-licensed library seems to implement a quite fancy time stretching algorithm: https://github.com/Signalsmith-Audio/signalsmith-stretch

They have a demo in here: https://signalsmith-audio.co.uk/code/stretch/demo/

If I have understood correctly, scaletempo2 implements algorithm 1 in "Four Ways To Write A Pitch-Shifter" by Geraint Luff, Rubberband implements something close to algorithm 3 and Signalsmith Stretch implements algorithm 4.

Comparing their output subjectively at an extreme slowdown, Rubberband's "Finer (R3)" engine retains shaper transitions but sounds more comb filtery while SignalSmith Stretch blurs transitions more while sounding less comb filtery.

I have not compared their CPU consumption.

richardpl commented 6 months ago

all algorithms/designs/implementations here are bad quality for polyphonic audio. atempo from ffmpeg is little better, but only for strictly speech sources.

ion1 commented 6 months ago

As far as I can tell, both scaletempo2 and atempo implement WSOLA.

christoph-heinrich commented 6 months ago

all algorithms/designs/implementations here are bad quality for polyphonic audio.

scaletempo with the changes from https://github.com/mpv-player/mpv/pull/12487 and scaletempo=overlap=1:search=40:stride=12 sounds pretty good to me, even for files with 6 channels.

richardpl commented 6 months ago

For what amount of stretching factor it sound good? Artifacts are less prominent with factors near 1.0

christoph-heinrich commented 6 months ago

For what amount of stretching factor it sound good?

speeds of 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.5, 3

The low rumble wobbles around in pitch (particularly at low speeds), but that's probably unavoidable with an overlap & add algorithm. Speech sounds good, I don't think it's possible to do any better. Maybe some other kind of algorithm could do a little better, but unless someone is willing to implement that, it's hard to say. Rubberband sounds worse imo (and uses a ton of CPU).

richardpl commented 6 months ago

I was more interested in other direction, slowing it instead of speeding up...., for speeding up it is much easier.

christoph-heinrich commented 6 months ago

I've never managed to get good results for slowing down with scaletempo(2), which is why I have

[rubberband]
profile-cond=speed < 1
profile-restore=copy-equal
af-remove=scaletempo2
af-add=rubberband

for the rare case where I reduce speed to < 1.

mesvam commented 6 months ago

I'm not convinced scaletempo2 is actually better than the original scaletempo at any speed. The problem is that mpv's default parameters for scaletempo gives suboptimal results, so when comparing each filter at default settings, scaletempo2 comes out ahead. But properly configured, scaletempo still beats scaletempo2. I have scaletempo=stride=15:overlap=1:search=15 and it gives nearly perfect playback quality from speeds 1 to 4, and I've never heard any artifacts on a variety of audio. CPU usage may be a bit higher with these settings, but at reasonable speeds on reasonably recent hardware, the load is negligible, especially compared to video decoding.

Meanwhile, for scaletempo2, no combination of parameters can guarantee artifact-free audio at any speed. And the artifacts can actually be quite severe. scaletempo2 has audible pitch shifting of as much as a semitone on drone notes in the background music, which sounds like wrong notes being played, which is really distracting. The subjective quality improvement at higher speeds is simply due to the artifacts being harder to hear since they go by so quickly, but they're still there.

For speeds < 1, scaletempo2 is sounds similar to scaletempo, but WSOLA-type algorithms are all a bit of a crapshoot. FFT methods are better for that IMO.

christoph-heinrich commented 6 months ago

@mesvam here is a little excerpt from a song with your scaletempo parameters 1.12x speed.webm 1x speed.webm

mesvam commented 6 months ago

@christoph-heinrich ok I stand corrected. That timber was worse than I expected.

I will say though, that even in that worst case, it's still better than when scaletempo2 goes wonky. Here is an example of background music going crazy with scaletempo2 excerpt.webm excerpt-scaletempo2-1.06.webm

What's worse is that the artifacts in your excerpt is mainly due to the bass frequencies, which can be fixed by increasing stride/search to 30 or higher scaletempo=stride=30:overlap=1:search=30, with some sacrifices when it comes to other content. I could not find any settings for scaletempo2 that would make my audio listenable, and there aren't even heavy bass frequencies in there!

Dudemanguy commented 5 months ago

Well #13748 improved this but I don't think it's necessarily fixed judging by the comments so reopening.

fideliochan commented 3 months ago

Is there any way to fix desync of atempo? because its still best one imo.

richardpl commented 3 months ago

not really, atempo filter changes timestamps and that causes desync, workaround is adding some hack which would rescale those timestamps back to original values that mpv expects.

fideliochan commented 3 months ago

what do you mean by hacks like this? -vf setpts='PTS/1.15' -af atempo=1.15

richardpl commented 3 months ago

Yes, something like that hardcoded to keep A/V sync but that breaks seeking to right spot...

richardpl commented 2 months ago

I have developed prototype filter that can stretch audio with 2x factor, using autocorrelation by RDFT to find similar periods plus interpolating found periods with equal-power cross-fade that make use of normalization cross-correlation factor between two periods. The output is much better than scaletempo(2) or atempo. Need to do similar for 1/2 factor for 2x speed gain.

richardpl commented 2 months ago

Got 0.5 and 2.0 ratios working well and fast. Maybe will add support for arbitrary ratios. If anybody interested to take a look at it I can push filter into librempeg.