micsthepick / JSVocalRedIso

Implementation of a fft center/vocals isolation plugin based on an audacity plugin
16 stars 1 forks source link

try to fix the nyquist band not getting processed #5

Closed micsthepick closed 2 years ago

micsthepick commented 2 years ago

The Nyquist band is not processed, and so does not get set to 0 when both sliders are set to 0.

micsthepick commented 2 years ago

Tested, and it seems to work correctly with the following plugin:

desc:generate a nyquist signal with panning options

slider1:gain_db_l=0<-150,0,1>L gain (dB)
slider2:gain_db_r=0<-150,0,1>R gain (dB)

in_pin:left input
in_pin:right input
out_pin:left output
out_pin:right output

@init
last_gain_l=10^(gain_db/20);
last_gain_r=10^(gain_db/20);

@slider
next_gain_l=10^(gain_db_l/20);
next_gain_r=10^(gain_db_r/20);

@block
d_gain_l = (next_gain_l - last_gain_l)/samplesblock;
d_gain_r = (next_gain_r - last_gain_r)/samplesblock;

@sample
spl0 = (osc = 1 - osc) * last_gain_l;
spl1 = osc * last_gain_r;
last_gain_l += d_gain_l;
last_gain_r += d_gain_r;
micsthepick commented 2 years ago

closes #4

FelipeZanabria commented 2 years ago

I suspect the problem may be with the window. I am visually impaired. To find the inaudible noise I did the following: 1- Create a track. 2- Add the VocalRediso effect, and set the low cut from 80 to 1. 3- Increase the volume of the track to 321.6, which will leave you with a peak of 0 dB in the master. 4- Give play and listen. It is possible to hear the transitions of the fft. If you set the size of fft to 512, you may notice a sawtooth sound. Perhaps the window has some calculation error or has small borders.

micsthepick commented 2 years ago

@FelipeZanabria have you pulled in the latest changes?

micsthepick commented 2 years ago

ok, I can hear it, only when it's pushed so high that it clips - except I'm feeding it a DC offset. I'm not sure that there's still any real issue?

micsthepick commented 2 years ago

I have a theory as to how the unfolding is supposed to work here: for band 0, there is some code to handle the case differently, I think it needs to also be handled differently by the algorithm (and possibly the coefficients get calculated the same way, or more work needs to be done to identify the DC and nyquist bands respectively for L and R.

FelipeZanabria commented 1 year ago

I see that the problem continues, and after having forgotten about the plugin as not needing to use it, I found a way to solve the Nyauist problem. It may not be the best way to code, but it worked for me to set band 0 and 1 to 0, and then increment the positions from 1 to size minus 1. I did this before defining the main function, but there may be a more elegant way to do this. encode. I found this in the noise reduction tutorial that this plugin is based on. vocalrediso.txt

micsthepick commented 1 year ago

hmm, it seems as if all that needs to happen here is to set overlaps to 2, use the cosine (Hahn) window and then I believe that everything else should be fine? I wouldn't set fftbuffer[0] to 0 in the init section, as it only runs once, and only setting fftbuffer[1] = 0 would potentially have some sort of filtering effect on the output? Let me know if that works for you. I can't test nor push to Github until I get home though. I'll attach the file here.

On Thu, 14 Sept 2023, 08:29 FelipeZanabria, @.***> wrote:

I see that the problem continues, and after having forgotten about the plugin as not needing to use it, I found a way to solve the Nyauist problem. It may not be the best way to code, but it worked for me to set band 0 and 1 to 0, and then increment the positions from 1 to size minus 1. I did this before defining the main function, but there may be a more elegant way to do this. encode. I found this in the noise reduction tutorial that this plugin is based on. vocalrediso.txt https://github.com/micsthepick/JSVocalRedIso/files/12602295/vocalrediso.txt

— Reply to this email directly, view it on GitHub https://github.com/micsthepick/JSVocalRedIso/pull/5#issuecomment-1718396982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACX6RZLI6JSIQE7WZHUISFTX2IXWHANCNFSM5XWWTTMQ . You are receiving this because you modified the open/close state.Message ID: @.***>

// Based on Vocalrediso.ny, a nyquist filter for audacity, currently released under the GNU GPL V3 // see https://www.gnu.org/licenses/gpl-3.0.en.html, https://www.audacityteam.org/ // credits to Neil Bickford for his denoiser, https://github.com/Nbickford/REAPERDenoiser, // which was used as a starting point for this script // which now uses the STFT template code by geraintluff: https://forum.cockos.com/showthread.php?t=225955

desc: vocal removal/isolation

//slider1:fft_size_index=2<0,7,1{256,512,1024,2048,4096,8192,16384,32768}>FFT size slider1:0<-100,100,0.1>dry mix slider2:0<-100,100,0.1>C mix (Vocals) slider3:0<-5,5,0.001>strength at Low Cut slider4:0<-5,5,0.001>strength at High Cut slider5:80<0,24000,10>Low Cut (Vocals) slider6:24000<0,24000,10>High Cut (Vocals) slider7:0<-90,90,0.1>Phase (Degrees) slider8:90<1,180,0.1>Phase width at Low Cut (Degrees) slider9:90<1,180,0.1>Phase width at High Cut (Degrees) slider10:1<0,1,0.05>Attenuate if different volume slider11:1<0,1,1{No,Yes}>undo input corrections slider12:0<-180,180,0.05>Phase2 (Degrees)

@init

function process_stft_segment(fft_buffer, fft_size) local(fft_bin, left_real, left_imag, right_real, right_imag) ( fft_bin = 0; // FFT bin number loop(fft_size/2+1, fft_bin2 = fft_bin ? (fft_size - fft_bin) : 0;

// Unfold complex spectrum into two real spectra
left_real = fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];
left_imag = fft_buffer[2*fft_bin + 1] - fft_buffer[2*fft_bin2 + 1];
right_real = fft_buffer[2*fft_bin + 1] + fft_buffer[2*fft_bin2 + 1];
right_imag = -fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];

// input corrections
// first, change the phase of L based on phase2:
l_real_twisted = left_real*cosine2 + left_imag*sine2;
l_imag_twisted = left_imag*cosine2 - left_real*sine2;

// now mix L&R together based on phase
r_real_rotated = right_real*cosine + l_real_twisted*sine;
r_imag_rotated = right_imag*cosine + l_imag_twisted*sine;
l_real_rotated = l_real_twisted*cosine - right_real*sine;
l_imag_rotated = l_imag_twisted*cosine - right_imag*sine;

//////////////////////// Main STFT block
// The 'meat' of the algorithm, the code in this block will most resemble the code from vocalrediso.ny

// first, apply  vocal reduction algorithm only in the right bands
fft_bin >= low_bin && fft_bin < high_bin ? (
  // get constants for this bin
  strength = strength_buffer[fft_bin];
  phase_width = phase_width_buffer[fft_bin];

  // cacluate energy for this bin
  norm_left = sqrt(sqr(l_real_rotated) + sqr(l_imag_rotated));
  norm_right = sqrt(sqr(r_real_rotated) + sqr(r_imag_rotated));

  // calculate phase difference between left & right, divide by phase_width
  w1 = acos((l_real_rotated * r_real_rotated + l_imag_rotated * r_imag_rotated) / (norm_left * norm_right)) / phase_width;
  // calculate weight: truncate w1 to [0, 1] and apply strength, then take 1 - the result, and multiply
  // by 1 - the square of the difference between the two norm values divided by the sum of the two, moderated by strength * slider10
  weight = (1 - min(1, max(0, w1)) ^ strength) * (
            1 - (sqr(norm_left - norm_right)/sqr(norm_left + norm_right))
        ) ^ (strength * slider10) / 2;

  // find the c channel, the sum of the two complex numbers
  c_real = l_real_rotated + r_real_rotated;
  c_imag = l_imag_rotated + r_imag_rotated;

  // apply weight to c channel
  c_real *= weight;
  c_imag *= weight;
) :
(
  c_real = 0;
  c_imag = 0;
);
//////////////////////// END MAIN STFT block

// apply wet dry mix
out_l_real = l_real_rotated * dry_mix + c_real * wet_mix;
out_l_imag = l_imag_rotated * dry_mix + c_imag * wet_mix;
out_r_real = r_real_rotated * dry_mix + c_real * wet_mix;
out_r_imag = r_imag_rotated * dry_mix + c_imag * wet_mix;

// output corrections
slider11 > 0.5 ? (
  // if requested, undo input corrections

  // unmix by phase
  l_real_out_twisted = out_l_real*cosine - out_r_real*sine;
  l_imag_out_twisted = out_l_imag*cosine - out_r_imag*sine;
  right_real = out_r_real*cosine + out_l_real*sine;
  right_imag = out_r_imag*cosine + out_l_imag*sine;

  left_real = l_real_out_twisted * cosine2 - l_imag_out_twisted * sine2;
  left_imag = l_imag_out_twisted * cosine2 + l_real_out_twisted * sine2;
) :
(
  // else, just copy the values
  left_real = out_l_real;
  left_imag = out_l_imag;
  right_real = out_r_real;
  right_imag = out_r_imag;
);

// Re-fold back into complex spectrum
fft_buffer[2*fft_bin] = (left_real - right_imag)*0.5;
fft_buffer[2*fft_bin + 1] = (left_imag + right_real)*0.5;
fft_buffer[2*fft_bin2] = (left_real + right_imag)*0.5;
fft_buffer[2*fft_bin2 + 1] = (-left_imag + right_real)*0.5;

fft_bin += 1;

); );

function setup_stft_state(fft_size, first_time) ( //////////////////////// Setup block // This is called when playback starts, or when the FFT size is changed 0; //////////////////////// );

MAX_FFT_SIZE = 32768; fft_size = 8192;

freemem = 0; freemem = (fft_buffer = freemem) + MAX_FFT_SIZE*2; freemem = (window_buffer = freemem) + MAX_FFT_SIZE;

buffer_length = srate; buffer_index = 0; freemem = (input_buffer = freemem) + buffer_length2; freemem = (output_buffer = freemem) + buffer_length2;

freemem = (strength_buffer = freemem) + buffer_length2; freemem = (phase_width_buffer = freemem) + buffer_length2;

function window(r) local(s, s2, gaussian_width, x) ( // When squared, the Hann window adds up perfectly for overlap >= 4, so it's suitable for perfect reconstruction (0.5 - 0.5cos(r2$pi))/sqrt(0.375); // the MLT sine window also appears to add up correctly, with sigma = sqrt(2). //sin(r$pi)*sqrt(2); );

overlap_factor = 2; fft_interval = fft_size/overlap_factor; fft_scaling_factor = 1/overlap_factor/fft_size;

fft_size != prev_fft_size ? ( setup_stft_state(fft_size, prev_fft_size == 0); prev_fft_size = fft_size; // Fill window buffer i = 0; loop(fft_size, r = i/fft_size; window_buffer[i] = window(r); i += 1; ); );

pdc_delay = fft_size; pdc_bot_ch = 0; pdc_top_ch = 2;

freembuf(freemem);

@sample

input_buffer[buffer_index2] = spl0; input_buffer[buffer_index2 + 1] = spl1;

fft_counter += 1; fft_counter >= fft_interval ? ( fft_counter = 0;

// Copy input to buffer bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; i2 < 0 ? i2 += buffer_length;

fft_buffer[2*i] = input_buffer[2*i2]*window_buffer[i];
fft_buffer[2*i + 1] = input_buffer[2*i2 + 1]*window_buffer[i];

i += 1;

);

// Process buffer fft(fft_buffer, fft_size); fft_permute(fft_buffer, fft_size);

process_stft_segment(fft_buffer, fft_size);

fft_ipermute(fft_buffer, fft_size); ifft(fft_buffer, fft_size);

// Add to output bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; (i2 < 0) ? i2 += buffer_length;

output_buffer[2*i2] += fft_buffer[2*i]*fft_scaling_factor*window_buffer[i];
output_buffer[2*i2 + 1] += fft_buffer[2*i + 1]*fft_scaling_factor*window_buffer[i];

i += 1;

); );

output_index = buffer_index - fft_size; output_index < 0 ? output_index += buffer_length; spl0 = output_buffer[output_index2]; spl1 = output_buffer[output_index2 + 1]; output_buffer[output_index2] = 0; // clear the sample we just read output_buffer[output_index2 + 1] = 0;

buffer_index = (buffer_index + 1)%buffer_length;

@slider // convert low cut and high cut to bins every time a slider is changed low_bin = min(slider5, slider6) / srate fft_size; high_bin = max(slider6, slider5) / srate fft_size; // convert to radians rotation = slider7$pi/180; // convert percentage to raw scale factor dry_mix = slider1/100; wet_mix = slider2/100; low_strength = slider3; high_strength = slider4; phase_width_low = slider8$pi/180; phase_width_high = slider9$pi/180; cosine = cos(rotation); sine = sin(rotation); cosine2 = cos(slider12$pi/180); sine2 = sin(slider12$pi/180); // fill strength_buffer and phase_width_buffer band_index = 0; loop(fft_size, band_index >= low_bin && band_index < high_bin ? ( // only set values for the appropriate frequency range frac = (band_index - low_bin)/(high_bin - low_bin - 1); frac = max(0, min(1, frac)); // fraction of progress through range [low_bin, high_bin) strength = low_strength (1 - frac) + high_strength frac; strength_buffer[band_index] = 10^strength; // precaculate strength (actual value should be positive, so it makes // sense to take the power of ten, but only after the // linear mapping over the spectrum is done. // precalculate phase width phase_width_buffer[band_index] = phase_width_low (1 - frac) + phase_width_high * frac; );

band_index += 1; // next index );

@serialize serial_version = 1.00; file_var(0, serial_version); // nothing serialized yet, but keep track of the serial_version // for the preset state of the original plugin, serial_version would now be euqal to 0.

micsthepick commented 1 year ago

actually, try this one (jamesdsp is for linux / android)

On Thu, 14 Sept 2023, 08:29 FelipeZanabria, @.***> wrote:

I see that the problem continues, and after having forgotten about the plugin as not needing to use it, I found a way to solve the Nyauist problem. It may not be the best way to code, but it worked for me to set band 0 and 1 to 0, and then increment the positions from 1 to size minus 1. I did this before defining the main function, but there may be a more elegant way to do this. encode. I found this in the noise reduction tutorial that this plugin is based on. vocalrediso.txt https://github.com/micsthepick/JSVocalRedIso/files/12602295/vocalrediso.txt

— Reply to this email directly, view it on GitHub https://github.com/micsthepick/JSVocalRedIso/pull/5#issuecomment-1718396982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACX6RZLI6JSIQE7WZHUISFTX2IXWHANCNFSM5XWWTTMQ . You are receiving this because you modified the open/close state.Message ID: @.***>

// Based on Vocalrediso.ny, a nyquist filter for audacity, currently released under the GNU GPL V3 // see https://www.gnu.org/licenses/gpl-3.0.en.html, https://www.audacityteam.org/ // credits to Neil Bickford for his denoiser, https://github.com/Nbickford/REAPERDenoiser, // which was used as a starting point for this script // which now uses the STFT template code by geraintluff: https://forum.cockos.com/showthread.php?t=225955

desc: vocal removal/isolation

//slider1:fft_size_index=2<0,7,1{256,512,1024,2048,4096,8192,16384,32768}>FFT size slider1:0<-100,100,0.1>dry mix slider2:0<-100,100,0.1>C mix (Vocals) slider3:0<-5,5,0.001>strength at Low Cut slider4:0<-5,5,0.001>strength at High Cut slider5:80<0,24000,10>Low Cut (Vocals) slider6:24000<0,24000,10>High Cut (Vocals) slider7:0<-90,90,0.1>Phase (Degrees) slider8:90<1,180,0.1>Phase width at Low Cut (Degrees) slider9:90<1,180,0.1>Phase width at High Cut (Degrees) slider10:1<0,1,0.05>Attenuate if different volume slider11:1<0,1,1{No,Yes}>undo input corrections slider12:0<-180,180,0.05>Phase2 (Degrees)

@init

function process_stft_segment(fft_buffer, fft_size) local(fft_bin, left_real, left_imag, right_real, right_imag) ( fft_bin = 0; // FFT bin number loop(fft_size/2+1, fft_bin2 = fft_bin ? (fft_size - fft_bin) : 0;

// Unfold complex spectrum into two real spectra
left_real = fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];
left_imag = fft_buffer[2*fft_bin + 1] - fft_buffer[2*fft_bin2 + 1];
right_real = fft_buffer[2*fft_bin + 1] + fft_buffer[2*fft_bin2 + 1];
right_imag = -fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];

// input corrections
// first, change the phase of L based on phase2:
l_real_twisted = left_real*cosine2 + left_imag*sine2;
l_imag_twisted = left_imag*cosine2 - left_real*sine2;

// now mix L&R together based on phase
r_real_rotated = right_real*cosine + l_real_twisted*sine;
r_imag_rotated = right_imag*cosine + l_imag_twisted*sine;
l_real_rotated = l_real_twisted*cosine - right_real*sine;
l_imag_rotated = l_imag_twisted*cosine - right_imag*sine;

//////////////////////// Main STFT block
// The 'meat' of the algorithm, the code in this block will most resemble the code from vocalrediso.ny

// first, apply  vocal reduction algorithm only in the right bands
fft_bin >= low_bin && fft_bin < high_bin ? (
  // get constants for this bin
  strength = strength_buffer[fft_bin];
  phase_width = phase_width_buffer[fft_bin];

  // cacluate energy for this bin
  norm_left = sqrt(sqr(l_real_rotated) + sqr(l_imag_rotated));
  norm_right = sqrt(sqr(r_real_rotated) + sqr(r_imag_rotated));

  // calculate phase difference between left & right, divide by phase_width
  w1 = acos((l_real_rotated * r_real_rotated + l_imag_rotated * r_imag_rotated) / (norm_left * norm_right)) / phase_width;
  // calculate weight: truncate w1 to [0, 1] and apply strength, then take 1 - the result, and multiply
  // by 1 - the square of the difference between the two norm values divided by the sum of the two, moderated by strength * slider10
  weight = (1 - min(1, max(0, w1)) ^ strength) * (
            1 - (sqr(norm_left - norm_right)/sqr(norm_left + norm_right))
        ) ^ (slider10 / strength) / 2;

  // find the c channel, the sum of the two complex numbers
  c_real = l_real_rotated + r_real_rotated;
  c_imag = l_imag_rotated + r_imag_rotated;

  // apply weight to c channel
  c_real *= weight;
  c_imag *= weight;
) :
(
  c_real = 0;
  c_imag = 0;
);
//////////////////////// END MAIN STFT block

// apply wet dry mix
out_l_real = l_real_rotated * dry_mix + c_real * wet_mix;
out_l_imag = l_imag_rotated * dry_mix + c_imag * wet_mix;
out_r_real = r_real_rotated * dry_mix + c_real * wet_mix;
out_r_imag = r_imag_rotated * dry_mix + c_imag * wet_mix;

// output corrections
slider11 > 0.5 ? (
  // if requested, undo input corrections

  // unmix by phase
  l_real_out_twisted = out_l_real*cosine - out_r_real*sine;
  l_imag_out_twisted = out_l_imag*cosine - out_r_imag*sine;
  right_real = out_r_real*cosine + out_l_real*sine;
  right_imag = out_r_imag*cosine + out_l_imag*sine;

  left_real = l_real_out_twisted * cosine2 - l_imag_out_twisted * sine2;
  left_imag = l_imag_out_twisted * cosine2 + l_real_out_twisted * sine2;
) :
(
  // else, just copy the values
  left_real = out_l_real;
  left_imag = out_l_imag;
  right_real = out_r_real;
  right_imag = out_r_imag;
);

// Re-fold back into complex spectrum
fft_buffer[2*fft_bin] = (left_real - right_imag)*0.5;
fft_buffer[2*fft_bin + 1] = (left_imag + right_real)*0.5;
fft_buffer[2*fft_bin2] = (left_real + right_imag)*0.5;
fft_buffer[2*fft_bin2 + 1] = (-left_imag + right_real)*0.5;

fft_bin += 1;

); );

function setup_stft_state(fft_size, first_time) ( //////////////////////// Setup block // This is called when playback starts, or when the FFT size is changed 0; //////////////////////// );

MAX_FFT_SIZE = 32768; fft_size = 8192;

freemem = 0; freemem = (fft_buffer = freemem) + MAX_FFT_SIZE*2; freemem = (window_buffer = freemem) + MAX_FFT_SIZE;

buffer_length = srate; buffer_index = 0; freemem = (input_buffer = freemem) + buffer_length2; freemem = (output_buffer = freemem) + buffer_length2;

freemem = (strength_buffer = freemem) + buffer_length2; freemem = (phase_width_buffer = freemem) + buffer_length2;

function window(r) local(s, s2, gaussian_width, x) ( // When squared, the Hann window adds up perfectly for overlap >= 4, so it's suitable for perfect reconstruction (0.5 - 0.5cos(r2$pi))/sqrt(0.375); // the MLT sine window also appears to add up correctly, with sigma = sqrt(2). //sin(r$pi)*sqrt(2); );

overlap_factor = 4; fft_interval = fft_size/overlap_factor; fft_scaling_factor = 1/overlap_factor/fft_size;

fft_size != prev_fft_size ? ( setup_stft_state(fft_size, prev_fft_size == 0); prev_fft_size = fft_size; // Fill window buffer i = 0; loop(fft_size, r = i/fft_size; window_buffer[i] = window(r); i += 1; ); );

pdc_delay = fft_size; pdc_bot_ch = 0; pdc_top_ch = 2;

freembuf(freemem);

@sample

input_buffer[buffer_index2] = spl0; input_buffer[buffer_index2 + 1] = spl1;

fft_counter += 1; fft_counter >= fft_interval ? ( fft_counter = 0;

// Copy input to buffer bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; i2 < 0 ? i2 += buffer_length;

fft_buffer[2*i] = input_buffer[2*i2]*window_buffer[i];
fft_buffer[2*i + 1] = input_buffer[2*i2 + 1]*window_buffer[i];

i += 1;

);

// Process buffer fft(fft_buffer, fft_size); fft_permute(fft_buffer, fft_size);

process_stft_segment(fft_buffer, fft_size);

fft_ipermute(fft_buffer, fft_size); ifft(fft_buffer, fft_size);

// Add to output bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; (i2 < 0) ? i2 += buffer_length;

output_buffer[2*i2] += fft_buffer[2*i]*fft_scaling_factor*window_buffer[i];
output_buffer[2*i2 + 1] += fft_buffer[2*i + 1]*fft_scaling_factor*window_buffer[i];

i += 1;

); );

output_index = buffer_index - fft_size; output_index < 0 ? output_index += buffer_length; spl0 = output_buffer[output_index2]; spl1 = output_buffer[output_index2 + 1]; output_buffer[output_index2] = 0; // clear the sample we just read output_buffer[output_index2 + 1] = 0;

buffer_index = (buffer_index + 1)%buffer_length;

@slider // convert low cut and high cut to bins every time a slider is changed low_bin = min(slider5, slider6) / srate fft_size; high_bin = max(slider6, slider5) / srate fft_size; // convert to radians rotation = slider7$pi/180; // convert percentage to raw scale factor dry_mix = slider1/100; wet_mix = slider2/100; low_strength = slider3; high_strength = slider4; phase_width_low = slider8$pi/180; phase_width_high = slider9$pi/180; cosine = cos(rotation); sine = sin(rotation); cosine2 = cos(slider12$pi/180); sine2 = sin(slider12$pi/180); // fill strength_buffer and phase_width_buffer band_index = 0; loop(fft_size, band_index >= low_bin && band_index < high_bin ? ( // only set values for the appropriate frequency range frac = (band_index - low_bin)/(high_bin - low_bin - 1); frac = max(0, min(1, frac)); // fraction of progress through range [low_bin, high_bin) strength = low_strength (1 - frac) + high_strength frac; strength_buffer[band_index] = 10^strength; // precaculate strength (actual value should be positive, so it makes // sense to take the power of ten, but only after the // linear mapping over the spectrum is done. // precalculate phase width phase_width_buffer[band_index] = phase_width_low (1 - frac) + phase_width_high * frac; );

band_index += 1; // next index );

@serialize serial_version = 1.00; file_var(0, serial_version); // nothing serialized yet, but keep track of the serial_version // for the preset state of the original plugin, serial_version would now be euqal to 0.

// Based on Vocalrediso.ny, a nyquist filter for audacity, currently released under the GNU GPL V3 // see https://www.gnu.org/licenses/gpl-3.0.en.html, https://www.audacityteam.org/ // credits to Neil Bickford for his denoiser, https://github.com/Nbickford/REAPERDenoiser, // which was used as a starting point for this script // which now uses the STFT template code by geraintluff: https://forum.cockos.com/showthread.php?t=225955

desc: vocal removal/isolation

//slider1:fft_size_index=2<0,7,1{256,512,1024,2048,4096,8192,16384,32768}>FFT size slider1:0<-100,100,0.1>dry mix slider2:0<-100,100,0.1>C mix (Vocals) slider3:0<-5,5,0.001>strength at Low Cut slider4:0<-5,5,0.001>strength at High Cut slider5:80<0,24000,10>Low Cut (Vocals) slider6:24000<0,24000,10>High Cut (Vocals) slider7:0<-90,90,0.1>Phase (Degrees) slider8:90<1,180,0.1>Phase width at Low Cut (Degrees) slider9:90<1,180,0.1>Phase width at High Cut (Degrees) slider10:1<0,1,0.05>Attenuate if different volume slider11:1<0,1,1{No,Yes}>undo input corrections slider12:0<-180,180,0.05>Phase2 (Degrees)

@init slider1=0; slider2=0; slider3=0; slider4=0; slider5=80; slider6=24000; slider7=0; slider8=90; slider9=90; slider10=1; slider11=1; slider12=0;

function process_stft_segment(fft_buffer, fft_size) local(fft_bin, left_real, left_imag, right_real, right_imag) ( fft_bin = 0; // FFT bin number loop(fft_size/2+1, fft_bin2 = fft_bin ? (fft_size - fft_bin) : 0;

// Unfold complex spectrum into two real spectra
left_real = fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];
left_imag = fft_buffer[2*fft_bin + 1] - fft_buffer[2*fft_bin2 + 1];
right_real = fft_buffer[2*fft_bin + 1] + fft_buffer[2*fft_bin2 + 1];
right_imag = -fft_buffer[2*fft_bin] + fft_buffer[2*fft_bin2];

// input corrections
// first, change the phase of L based on phase2:
l_real_twisted = left_real*cosine2 + left_imag*sine2;
l_imag_twisted = left_imag*cosine2 - left_real*sine2;

// now mix L&R together based on phase
r_real_rotated = right_real*cosine + l_real_twisted*sine;
r_imag_rotated = right_imag*cosine + l_imag_twisted*sine;
l_real_rotated = l_real_twisted*cosine - right_real*sine;
l_imag_rotated = l_imag_twisted*cosine - right_imag*sine;

//////////////////////// Main STFT block
// The 'meat' of the algorithm, the code in this block will most resemble the code from vocalrediso.ny

// first, apply  vocal reduction algorithm only in the right bands
fft_bin >= low_bin && fft_bin < high_bin ? (
  // get constants for this bin
  strength = strength_buffer[fft_bin];
  phase_width = phase_width_buffer[fft_bin];

  // cacluate energy for this bin
  norm_left = sqrt(sqr(l_real_rotated) + sqr(l_imag_rotated));
  norm_right = sqrt(sqr(r_real_rotated) + sqr(r_imag_rotated));

  // calculate phase difference between left & right, divide by phase_width
  w1 = acos((l_real_rotated * r_real_rotated + l_imag_rotated * r_imag_rotated) / (norm_left * norm_right)) / phase_width;
  // calculate weight: truncate w1 to [0, 1] and apply strength, then take 1 - the result, and multiply
  // by 1 - the square of the difference between the two norm values divided by the sum of the two, moderated by strength * slider10
  weight = (1 - min(1, max(0, w1)) ^ strength) * (
            1 - (sqr(norm_left - norm_right)/sqr(norm_left + norm_right))
        ) ^ (slider10 / strength) / 2;

  // find the c channel, the sum of the two complex numbers
  c_real = l_real_rotated + r_real_rotated;
  c_imag = l_imag_rotated + r_imag_rotated;

  // apply weight to c channel
  c_real *= weight;
  c_imag *= weight;
) :
(
  c_real = 0;
  c_imag = 0;
);
//////////////////////// END MAIN STFT block

// apply wet dry mix
out_l_real = l_real_rotated * dry_mix + c_real * wet_mix;
out_l_imag = l_imag_rotated * dry_mix + c_imag * wet_mix;
out_r_real = r_real_rotated * dry_mix + c_real * wet_mix;
out_r_imag = r_imag_rotated * dry_mix + c_imag * wet_mix;

// output corrections
slider11 > 0.5 ? (
  // if requested, undo input corrections

  // unmix by phase
  l_real_out_twisted = out_l_real*cosine - out_r_real*sine;
  l_imag_out_twisted = out_l_imag*cosine - out_r_imag*sine;
  right_real = out_r_real*cosine + out_l_real*sine;
  right_imag = out_r_imag*cosine + out_l_imag*sine;

  left_real = l_real_out_twisted * cosine2 - l_imag_out_twisted * sine2;
  left_imag = l_imag_out_twisted * cosine2 + l_real_out_twisted * sine2;
) :
(
  // else, just copy the values
  left_real = out_l_real;
  left_imag = out_l_imag;
  right_real = out_r_real;
  right_imag = out_r_imag;
);

// Re-fold back into complex spectrum
fft_buffer[2*fft_bin] = (left_real - right_imag)*0.5;
fft_buffer[2*fft_bin + 1] = (left_imag + right_real)*0.5;
fft_buffer[2*fft_bin2] = (left_real + right_imag)*0.5;
fft_buffer[2*fft_bin2 + 1] = (-left_imag + right_real)*0.5;

fft_bin += 1;

); );

function setup_stft_state(fft_size, first_time) ( //////////////////////// Setup block // This is called when playback starts, or when the FFT size is changed 0; //////////////////////// );

MAX_FFT_SIZE = 32768; fft_size = 8192;

freemem = 0; freemem = (fft_buffer = freemem) + MAX_FFT_SIZE*2; freemem = (window_buffer = freemem) + MAX_FFT_SIZE;

buffer_length = srate; buffer_index = 0; freemem = (input_buffer = freemem) + buffer_length2; freemem = (output_buffer = freemem) + buffer_length2;

freemem = (strength_buffer = freemem) + buffer_length2; freemem = (phase_width_buffer = freemem) + buffer_length2;

function window(r) local(s, s2, gaussian_width, x) ( // When squared, the Hann window adds up perfectly for overlap >= 4, so it's suitable for perfect reconstruction (0.5 - 0.5cos(r2$pi))/sqrt(0.375); // the MLT sine window also appears to add up correctly, with sigma = sqrt(2). //sin(r$pi)*sqrt(2); );

overlap_factor = 4; fft_interval = fft_size/overlap_factor; fft_scaling_factor = 1/overlap_factor/fft_size;

fft_size != prev_fft_size ? ( setup_stft_state(fft_size, prev_fft_size == 0); prev_fft_size = fft_size; // Fill window buffer i = 0; loop(fft_size, r = i/fft_size; window_buffer[i] = window(r); i += 1; ); );

pdc_delay = fft_size; pdc_bot_ch = 0; pdc_top_ch = 2;

// convert low cut and high cut to bins every time a slider is changed low_bin = min(slider5, slider6) / srate fft_size; high_bin = max(slider6, slider5) / srate fft_size; // convert to radians rotation = slider7$pi/180; // convert percentage to raw scale factor dry_mix = slider1/100; wet_mix = slider2/100; low_strength = slider3; high_strength = slider4; phase_width_low = slider8$pi/180; phase_width_high = slider9$pi/180; cosine = cos(rotation); sine = sin(rotation); cosine2 = cos(slider12$pi/180); sine2 = sin(slider12$pi/180); // fill strength_buffer and phase_width_buffer band_index = 0; loop(fft_size, band_index >= low_bin && band_index < high_bin ? ( // only set values for the appropriate frequency range frac = (band_index - low_bin)/(high_bin - low_bin - 1); frac = max(0, min(1, frac)); // fraction of progress through range [low_bin, high_bin) strength = low_strength (1 - frac) + high_strength frac; strength_buffer[band_index] = 10^strength; // precaculate strength (actual value should be positive, so it makes // sense to take the power of ten, but only after the // linear mapping over the spectrum is done. // precalculate phase width phase_width_buffer[band_index] = phase_width_low (1 - frac) + phase_width_high * frac; );

band_index += 1; // next index );

@sample

input_buffer[buffer_index2] = spl0; input_buffer[buffer_index2 + 1] = spl1;

fft_counter += 1; fft_counter >= fft_interval ? ( fft_counter = 0;

// Copy input to buffer bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; i2 < 0 ? i2 += buffer_length;

fft_buffer[2*i] = input_buffer[2*i2]*window_buffer[i];
fft_buffer[2*i + 1] = input_buffer[2*i2 + 1]*window_buffer[i];

i += 1;

);

// Process buffer fft(fft_buffer, fft_size); fft_permute(fft_buffer, fft_size);

process_stft_segment(fft_buffer, fft_size);

fft_ipermute(fft_buffer, fft_size); ifft(fft_buffer, fft_size);

// Add to output bi = buffer_index - fft_size + 1; i = 0; loop(fft_size, i2 = bi + i; (i2 < 0) ? i2 += buffer_length;

output_buffer[2*i2] += fft_buffer[2*i]*fft_scaling_factor*window_buffer[i];
output_buffer[2*i2 + 1] += fft_buffer[2*i + 1]*fft_scaling_factor*window_buffer[i];

i += 1;

); );

output_index = buffer_index - fft_size; output_index < 0 ? output_index += buffer_length; spl0 = output_buffer[output_index2]; spl1 = output_buffer[output_index2 + 1]; output_buffer[output_index2] = 0; // clear the sample we just read output_buffer[output_index2 + 1] = 0;

buffer_index = (buffer_index + 1)%buffer_length;

FelipeZanabria commented 1 year ago

Sorry, I set the overlap earlier to see if I could overlay correctly using another square root number. The main thing I did was set buffer bins 0 and 1 to 0. The window should be sine mlt and the overlap 4.

FelipeZanabria commented 1 year ago

This works well. Silence si no hay sonido o si todo está filtrado.

micsthepick commented 1 year ago

will also push a fix for glitches when seeking.