pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.53k stars 652 forks source link

Torchaudio for aarch64? #606

Closed praesc closed 1 year ago

praesc commented 4 years ago

🚀 Feature

Is there a realease for aarch64?

Motivation

It'd be great to be able to deploy a pytorch module on a RPI for speech recognition

Pitch

Alternatives

Additional context

vincentqb commented 4 years ago

Since pytorch doesn't officially support aarch64 (see here), we also don't officially. That being said, if you do succeed in making it work. Please feel free to details the step here :)

praesc commented 4 years ago

thanks for the quick reply vincent. I tried to build the last version from source and got some issues while compiling on the Jetson Nano. However, version 0.3.0 worked out smoothly without out-of-the-box. Nvidia provides some pre-compiled wheels for their platforms, so it's straight forward to install pytorch on them. Hence, would be nice to have torchaudio as well

ark626 commented 4 years ago

Would also love to see it running on aarch64 since torch is now running there already.

657 ive also already commented, seems like the installation of the prequisites is the issue?

Installing via pyton3 setup.py install ran trough fine

Processing dependencies for torchaudio==0.6.0a0+313f4f5 Searching for torch==1.5.0 Best match: torch 1.5.0 Adding torch 1.5.0 to easy-install.pth file Installing convert-caffe2-to-onnx script to /usr/local/bin Installing convert-onnx-to-caffe2 script to /usr/local/bin

Using /home/ark626/.local/lib/python3.6/site-packages Searching for future==0.17.1 Best match: future 0.17.1 Adding future 0.17.1 to easy-install.pth file Installing futurize script to /usr/local/bin Installing pasteurize script to /usr/local/bin

Using /usr/local/lib/python3.6/dist-packages Searching for numpy==1.18.4 Best match: numpy 1.18.4 Adding numpy 1.18.4 to easy-install.pth file Installing f2py script to /usr/local/bin Installing f2py3 script to /usr/local/bin Installing f2py3.6 script to /usr/local/bin

Using /home/ark626/.local/lib/python3.6/site-packages Finished processing dependencies for torchaudio==0.6.0a0+313f4f5

But running ./packaging/build_from_source.sh $PW runs into issues.

ark626 commented 4 years ago

Okay got it installed by working myself around.

=> Run the Script once until it fails Then alter the Parts like this (so libmad and lame are not overwritten)

#!/bin/bash

set -ex

# Arguments: PREFIX, specifying where to install dependencies into

PREFIX="$1"

#rm -rf /tmp/torchaudio-deps
#mkdir /tmp/torchaudio-deps
pushd /tmp/torchaudio-deps

# Curl Settings
CURL_OPTS="-L --retry 10 --connect-timeout 5 --max-time 180"

curl $CURL_OPTS -o sox-14.4.2.tar.bz2 "https://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "http://ftp.us.debian.org/debian/pool/main/l/lame/lame_3.99.5+repack1-9+b2_arm64.deb"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz"
curl $CURL_OPTS -o flac-1.3.2.tar.xz "https://downloads.sourceforge.net/project/flac/flac-src/flac-1.3.2.tar.xz"
#curl $CURL_OPTS -o libmad-0.15.1b.tar.gz  "https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/libmad/0.15.1b-9ubuntu16.$
#"https://downloads.sourceforge.net/project/mad/libmad/0.15.1b/libmad-0.15.1b.tar.gz"

echo CurlDone
# unpack the dependencies
tar xfp sox-14.4.2.tar.bz2
#tar xfp lame-3.99.5.tar.gz
tar xfp flac-1.3.2.tar.xz
#tar xfp libmad-0.15.1b.tar.gz

Then replace those two config.guess files with this version: => https://svn.osgeo.org/grass/grass/tags/release_20150712_grass_6_4_5/config.guess

/tmp/torchaudio-deps/lame-3.99.5/config.guess /tmp/torchaudio-deps/libmad-0.15.1b/config.guess

Now run the script again, and it should be running trough fine.

Issue is that the config guess in the two libraries is from 2003 and super old. So it doesnt know aarch as Machine.

moih commented 3 years ago

Hi @ark626 , I recently saw you are trying to run MelGAN-VC on a aarm64 device (Jetson Nano). I am trying as well, successfully compiled torchaudio but still can't manage to perform inference with said model. I'm wondering if you had any success with this and if you are willing to share your experience. Thanks in advance.

ark626 commented 3 years ago

Yes i fit ist running. I will answere later in more Detail. In the meantime you Coupe Check the Guide in this ive written

JetsonXavierAGX

ark626 commented 3 years ago

Okay so in general the guide here referes to some of the usefull things ive used: https://github.com/ark626/JetsonXavierAGX

If i recall it properly to run MelGAN-VC you needed to install torch and a specific version of tensorflow (https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing) Afterwards i needed to respect the following things:

Sometimes if Tensorflow is used with Pytorch, it can try to use all memory. This is of course bad news, because the RAM is a combined RAM for GPU and CPU. To prevent a overallocation for CUDA one can Limit the RAM CUDA uses. For TF2 this looks like this:

gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=CUDARAMINMB)])

CUDARAMINMB Is the CUDA RAM Limit in MB. The two lines above can be placed after the import of Tensorflow.

Also when using Tensorflow with Pytorch, there often is a weird error message about a block issue. This happens when Tensorflow is imported first. Make sure to import Tensorflow AFTER Pytorch, then it will work.

I didnt compile torchaudio i just used the torch installer as far as i recall, because the compiler failed every time.

ark626 commented 3 years ago

I used:

torch @ file:///media/ext/dataSets/MelGANVC/torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl

Editable install with no version control (torchaudio==0.7.0a0+102174e)

-e /usr/local/lib/python3.6/dist-packages/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg

The torch is linked in my guide.(https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing)

For the torchaudio i sadly dont recall how i installed it exactly. But i will append the path where it is stored together with a archive of the folder.

Path is something like \SHODAN\pihome\usr\local\lib\python3.6\dist-packages The file which should be extracted there is: https://www.mediafire.com/file/9llme9a0ijtbu08/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg.rar/file

=> I also can tell you, that MelGANVC doesnt use the whole torchaudio stuff, so in general even some half installations will work, because they only implemented the Algorithm to convert MelScales back to Audio in audiotorch. The rest is in tensorflow.

My imports look like this;

import matplotlib.pyplot as plt
import collections
from PIL import Image
from skimage.transform import resize
import imageio
import librosa
import librosa.display
from librosa.feature import melspectrogram
import os
import time
#import IPython
#import tensorflow as tf
#os.environ['LIBROSA_CACHE_DIR'] = 'C:/Users/ark-6/tmp'
os.environ['LIBROSA_CACHE_LEVEL'] = '50'
import wave
from glob import glob
import numpy as np
from pathlib import Path

import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm
from functools import partial
import math
import heapq
from torchaudio.transforms import MelScale, Spectrogram

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Concatenate, Conv2D, Conv2DTranspose, GlobalAveragePooling2D, UpSampling2D, LeakyReLU, ReLU, Add, Multiply, Lambda, Dot, BatchNormalization, Activation, ZeroPadding2D, Cropping2D, Cropping1D
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import TruncatedNormal, he_normal
import tensorflow.keras.backend as K

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=10024)])

If i can help you any further dont hesitate to ask, because i remember for my Thesis it was quite a hazzle to get this working.

moih commented 3 years ago

Hi @ark626 ,

I've managed to get the generator part of MelGAN-VC working, thanks to your help. I'm running on a Jetson Nano Developer Kit 2GB, and I'm limiting my memory_limit=256

But now I am receiving this error, which is to do with Tensorflow it seems...Here is my log:

Built networks (196096,) (7, 512, 64, 1) Generating... 2021-02-06 16:32:03.058473: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 2021-02-06 16:32:12.095268: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2,14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-02-06 16:32:18.477486: E tensorflow/stream_executor/cuda/cuda_driver.cc:952] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497364: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497405: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497597: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497648: E tensorflow/stream_executor/stream.cc:5479] Internal: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497757: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497799: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): 2021-02-06 16:32:18.506196: W tensorflow/core/kernels/gpu_utils.cc:69] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'Failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2021-02-06 16:32:18.506314: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.545824: I tensorflow/stream_executor/stream.cc:4963] [stream=0x9418a1e0,impl=0x94189be0] did not memzero GPU location; source: 0x7ff39e55b8 2021-02-06 16:32:18.546030: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.546073: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): Traceback (most recent call last): File "melgan_generator.py", line 740, in <module> abwv = towave(speca, name=output_name, path=output_directory) #Convert and save wav File "melgan_generator.py", line 704, in towave ab = gen(a, training=False) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 717, in call convert_kwargs_to_constants=base_layer_utils.call_context().saving) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 891, in _run_internal_graph output_tensors = layer(computed_tensors, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "melgan_generator.py", line 404, in call dilation_rate=self.dilation_rate) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 4954, in conv2d_transpose data_format=tf_data_format) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2246, in conv2d_transpose name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2317, in conv2d_transpose_v2 name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1253, in conv2d_backprop_input _ops.raise_from_not_ok_status(e, name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: DNN Backward Data function launch failure : input shape([7,512,64,1]) filter shape([512,1,1,512]) [Op:Conv2DBackpropInput] name: G/conv_s_n2d_transpose/conv2d_transpose/

Any idea what this might be? Thanks so much in advance.

ark626 commented 3 years ago

I think the issue here was the tf version. Try installing tensorflow==1.15.4+nv20.11 Also it seems that that the memory is a little bit to small and ran out of memory trying to allocate 2,14GiB.

You can try to make a smaller model by decreasing it somehow. (You can also try decrease the samples to make the allocation smaller.) You can do serveral runs with different models if you want to split it and reload the old saved model.

moih commented 3 years ago

Thanks @ark626 , I will try that TF installation, also worth noting that I'm running JetPack V 4.5, whereas you are running version 4.4.x, would that make a difference?

One last thing is that I'm running the MelGAN-VC model with these hyperparams:


hop=512               #hop size (window size = 6*hop)
sr=44100              #sampling rate
min_level_db=-100     #reference values to normalize data
ref_level_db=20

shape=64              #length of time axis of split specrograms to feed to generator            
vec_len=128           #length of vector generated by siamese vector
bs = 4               #batch size
delta = 2.            #constant for siamese loss

At which sample rate and with what batch size did you manage to perform inference? Do you think running the model at 16khz sample rate help in fitting it into memory?

Thanks again for your help!

ark626 commented 3 years ago

to reduce the ram usage you can reduce the vec_len i.E. 64 or 32. The sampling rate sr should match the ones of your samples you use to train. Also i extracted the shape as parameter which can be reduced to 32 or 16. See the code below.

The sample rate will not help you in terms of ram. IT should fit your samples so verify that all your samples have the same sr The filters can be decreased untill 256 Batch size can be decreased until 2-3 so it will save ram.

```python from __future__ import print_function, division from glob import glob import scipy import soundfile as sf import matplotlib.pyplot as plt #from IPython.display import clear_output import datetime import numpy as np import random import matplotlib.pyplot as plt import collections from PIL import Image from skimage.transform import resize import imageio import librosa import librosa.display from librosa.feature import melspectrogram import os import time #import IPython #import tensorflow as tf #os.environ['LIBROSA_CACHE_DIR'] = 'C:/Users/ark-6/tmp' os.environ['LIBROSA_CACHE_LEVEL'] = '50' import wave from glob import glob import numpy as np from pathlib import Path import torch import torch.nn as nn import torch.nn.functional as F from tqdm import tqdm from functools import partial import math import heapq from torchaudio.transforms import MelScale, Spectrogram import tensorflow as tf from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Concatenate, Conv2D, Conv2DTranspose, GlobalAveragePooling2D, UpSampling2D, LeakyReLU, ReLU, Add, Multiply, Lambda, Dot, BatchNormalization, Activation, ZeroPadding2D, Cropping2D, Cropping1D from tensorflow.keras.models import Sequential, Model, load_model from tensorflow.keras.optimizers import Adam from tensorflow.keras.initializers import TruncatedNormal, he_normal import tensorflow.keras.backend as K gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=10024)]) def ensure_dir(path): Path(path).mkdir(parents=True, exist_ok=True) def load_array(path,typ): ls = sorted(glob(f'{path}/'+str(typ))) adata = [] for i in range(len(ls)): #print(ls) x = ls[i] #print(x) adata.append(x) return np.array(adata) def convertWaves(loadPath,savePath,saveName): infiles = load_array(loadPath,'*.wav')#["sound_1.wav", "sound_2.wav"] outfile = saveName data= [] x=0 for infile in infiles: x = x+1 print(x) w = wave.open(infile, 'rb') data.append( [w.getparams(), w.readframes(w.getnframes())] ) w.close() output = wave.open(str(savePath+'/'+outfile), 'wb') output.setparams(data[0][0]) for i in range(x): output.writeframes(data[i][1]) output.close() #Hyperparameters hop=256 #hop size (window size = 6*hop) sr=16000 #sampling rate min_level_db=-100 #reference values to normalize data ref_level_db=20 filters = 1024 shape=48 #length of time axis of split specrograms to feed to generator vec_len=128 #length of vector generated by siamese vector bs = 16 #batch size delta = 2. #constant for siamese loss epochs =5000 #Epochs to train sampleFile = './dome/MelGANVC_2/Gruppe_A'#'./TestwavALL/dome'#'./devil.wav' loadModel = False # Checks weather the model should load from a save or start new loadModelPath = './ai/load' #If loadModel is True this is the path where to load from finalResultPath = './ai/test' #Where should the final Generated Result of the Sample be saved n_saves = 2 # Save and Generate Test Sample after how many epochs aPath = './dome/MelGANVC_2/Gruppe_A' # Source Domain wav Folder Path bPath = './dome/MelGANVC_2/Gruppe_B' #Target Domain wav Folder Path #There seems to be a problem with Tensorflow STFT, so we'll be using pytorch to handle offline mel-spectrogram generation and waveform reconstruction #For waveform reconstruction, a gradient-based method is used: #ORIGINAL CODE FROM https://github.com/yoyololicon/spectrogram-inversion torch.set_default_tensor_type('torch.cuda.FloatTensor') torch.backends.cudnn.deterministic = False torch.backends.cudnn.benchmark = False specobj = Spectrogram(n_fft=6*hop, win_length=6*hop, hop_length=hop, pad=0, power=2, normalized=True) #specobj = librosa.feature.melspectrogram(y=None, sr=22050, S=None, n_fft=6*hop, hop_length=hop, power=2.0, **kwargs) specfunc = specobj.forward melobj = MelScale(n_mels=hop, sample_rate=sr, f_min=0.) #melobj = librosa.filters.mel(sr=sr, n_mels=hop,fmin=0.) melfunc = melobj.forward def melspecfunc(waveform): specgram = specfunc(waveform) mel_specgram = melfunc(specgram) return mel_specgram def spectral_convergence(input, target): return 20 * ((input - target).norm().log10() - target.norm().log10()) def GRAD(spec, transform_fn, samples=None, init_x0=None, maxiter=1000, tol=1e-6, verbose=1, evaiter=10, lr=0.003): spec = torch.Tensor(spec) samples = (spec.shape[-1]*hop)-hop if init_x0 is None: init_x0 = spec.new_empty((1,samples)).normal_(std=1e-6) x = nn.Parameter(init_x0) T = spec criterion = nn.L1Loss() optimizer = torch.optim.Adam([x], lr=lr) bar_dict = {} metric_func = spectral_convergence bar_dict['spectral_convergence'] = 0 metric = 'spectral_convergence' init_loss = None with tqdm(total=maxiter, disable=not verbose) as pbar: for i in range(maxiter): optimizer.zero_grad() V = transform_fn(x) loss = criterion(V, T) loss.backward() optimizer.step() lr = lr*0.9999 for param_group in optimizer.param_groups: param_group['lr'] = lr if i % evaiter == evaiter - 1: with torch.no_grad(): V = transform_fn(x) bar_dict[metric] = metric_func(V, spec).item() l2_loss = criterion(V, spec).item() pbar.set_postfix(**bar_dict, loss=l2_loss) pbar.update(evaiter) return x.detach().view(-1).cpu() def normalize(S): return np.clip((((S - min_level_db) / -min_level_db)*2.)-1., -1, 1) def denormalize(S): return (((np.clip(S, -1, 1)+1.)/2.) * -min_level_db) + min_level_db def prep(wv,hop=192): S = np.array(torch.squeeze(melspecfunc(torch.Tensor(wv).view(1,-1))).detach().cpu()) S = librosa.power_to_db(S)-ref_level_db return normalize(S) def deprep(S): S = denormalize(S)+ref_level_db S = librosa.db_to_power(S) wv = GRAD(np.expand_dims(S,0), melspecfunc, maxiter=2000, evaiter=10, tol=1e-8) return np.array(np.squeeze(wv)) #Helper functions #Generate spectrograms from waveform array def tospec(data): specs=np.empty(data.shape[0], dtype=object) for i in range(data.shape[0]): x = data[i] S=prep(x) S = np.array(S, dtype=np.float32) specs[i]=np.expand_dims(S, -1) print(specs.shape) return specs #Generate multiple spectrograms with a determined length from single wav file def tospeclong(path, length=4*16000): x, sr = librosa.load(path,sr=16000) x,_ = librosa.effects.trim(x) loudls = librosa.effects.split(x, top_db=50) xls = np.array([]) for interv in loudls: xls = np.concatenate((xls,x[interv[0]:interv[1]])) x = xls num = x.shape[0]//length specs=np.empty(num, dtype=object) for i in range(num-1): a = x[i*length:(i+1)*length] S = prep(a) S = np.array(S, dtype=np.float32) try: sh = S.shape specs[i]=S except AttributeError: print('spectrogram failed') print(specs.shape) return specs #Waveform array from path of folder containing wav files def audio_array(path): ls = glob(f'{path}/*.wav') adata = [] for i in range(len(ls)): #print(ls) x, sr = tf.audio.decode_wav(tf.io.read_file(ls[i]), 1) #print(x) x = np.array(x).astype(dtype=np.float32) adata.append(x) return np.array(adata) #Concatenate spectrograms in array along the time axis def testass(a): but=False con = np.array([]) nim = a.shape[0] for i in range(nim): im = a[i] im = np.squeeze(im) if not but: con=im but=True else: con = np.concatenate((con,im), axis=1) return np.squeeze(con) #Split spectrograms in chunks with equal size def splitcut(data): ls = [] mini = 0 minifinal = 10*shape #max spectrogram length for i in range(data.shape[0]-1): if data[i].shape[1]<=data[i+1].shape[1]: mini = data[i].shape[1] else: mini = data[i+1].shape[1] if mini>=3*shape and mini=3*shape: for n in range(x.shape[1]//minifinal): ls.append(x[:,n*minifinal:n*minifinal+minifinal,:]) ls.append(x[:,-minifinal:,:]) return np.array(ls) #Generating Mel-Spectrogram dataset (Uncomment where needed) #adata: source spectrograms #bdata: target spectrograms #MALE1 #awv = audio_array('../content/cmu_us_clb_arctic/wav') #get waveform array from folder containing wav files #aspec = tospec(awv) #get spectrogram array #adata = splitcut(aspec) #split spectrogams to fixed length #FEMALE1 #bwv = audio_array('../content/cmu_us_bdl_arctic/wav') #bspec = tospec(bwv) #bdata = splitcut(bspec) # #MALE2 # awv = audio_array('../content/cmu_us_rms_arctic/wav') # aspec = tospec(awv) # adata = splitcut(aspec) # #FEMALE2 # bwv = audio_array('../content/cmu_us_slt_arctic/wav') # bspec = tospec(bwv) # bdata = splitcut(bspec) #JAZZ MUSIC awv = audio_array(aPath) aspec = tospec(awv) adata = splitcut(aspec) #CLASSICAL MUSIC bwv = audio_array(bPath) bspec = tospec(bwv) bdata = splitcut(bspec) #Creating Tensorflow Datasets def proc(x): return tf.image.random_crop(x, size=[hop, 3*shape, 1]) dsa = tf.data.Dataset.from_tensor_slices(adata).repeat(50).map(proc, num_parallel_calls=tf.data.experimental.AUTOTUNE).shuffle(10000).batch(bs, drop_remainder=True) dsb = tf.data.Dataset.from_tensor_slices(bdata).repeat(50).map(proc, num_parallel_calls=tf.data.experimental.AUTOTUNE).shuffle(10000).batch(bs, drop_remainder=True) #Adding Spectral Normalization to convolutional layers from tensorflow.python.keras.utils import conv_utils from tensorflow.python.ops import array_ops from tensorflow.python.ops import math_ops from tensorflow.python.ops import sparse_ops from tensorflow.python.ops import gen_math_ops from tensorflow.python.ops import standard_ops from tensorflow.python.eager import context from tensorflow.python.framework import tensor_shape def l2normalize(v, eps=1e-12): return v / (tf.norm(v) + eps) class ConvSN2D(tf.keras.layers.Conv2D): def __init__(self, filters, kernel_size, power_iterations=1, **kwargs): super(ConvSN2D, self).__init__(filters, kernel_size, **kwargs) self.power_iterations = power_iterations def build(self, input_shape): super(ConvSN2D, self).build(input_shape) if self.data_format == 'channels_first': channel_axis = 1 else: channel_axis = -1 self.u = self.add_weight(self.name + '_u', shape=tuple([1, self.kernel.shape.as_list()[-1]]), initializer=tf.initializers.RandomNormal(0, 1), trainable=False ) def compute_spectral_norm(self, W, new_u, W_shape): for _ in range(self.power_iterations): new_v = l2normalize(tf.matmul(new_u, tf.transpose(W))) new_u = l2normalize(tf.matmul(new_v, W)) sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u)) W_bar = W/sigma with tf.control_dependencies([self.u.assign(new_u)]): W_bar = tf.reshape(W_bar, W_shape) return W_bar def call(self, inputs): W_shape = self.kernel.shape.as_list() W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1])) new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape) outputs = self._convolution_op(inputs, new_kernel) if self.use_bias: if self.data_format == 'channels_first': outputs = tf.nn.bias_add(outputs, self.bias, data_format='NCHW') else: outputs = tf.nn.bias_add(outputs, self.bias, data_format='NHWC') if self.activation is not None: return self.activation(outputs) return outputs class ConvSN2DTranspose(tf.keras.layers.Conv2DTranspose): def __init__(self, filters, kernel_size, power_iterations=1, **kwargs): super(ConvSN2DTranspose, self).__init__(filters, kernel_size, **kwargs) self.power_iterations = power_iterations def build(self, input_shape): super(ConvSN2DTranspose, self).build(input_shape) if self.data_format == 'channels_first': channel_axis = 1 else: channel_axis = -1 self.u = self.add_weight(self.name + '_u', shape=tuple([1, self.kernel.shape.as_list()[-1]]), initializer=tf.initializers.RandomNormal(0, 1), trainable=False ) def compute_spectral_norm(self, W, new_u, W_shape): for _ in range(self.power_iterations): new_v = l2normalize(tf.matmul(new_u, tf.transpose(W))) new_u = l2normalize(tf.matmul(new_v, W)) sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u)) W_bar = W/sigma with tf.control_dependencies([self.u.assign(new_u)]): W_bar = tf.reshape(W_bar, W_shape) return W_bar def call(self, inputs): W_shape = self.kernel.shape.as_list() W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1])) new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape) inputs_shape = array_ops.shape(inputs) batch_size = inputs_shape[0] if self.data_format == 'channels_first': h_axis, w_axis = 2, 3 else: h_axis, w_axis = 1, 2 height, width = inputs_shape[h_axis], inputs_shape[w_axis] kernel_h, kernel_w = self.kernel_size stride_h, stride_w = self.strides if self.output_padding is None: out_pad_h = out_pad_w = None else: out_pad_h, out_pad_w = self.output_padding out_height = conv_utils.deconv_output_length(height, kernel_h, padding=self.padding, output_padding=out_pad_h, stride=stride_h, dilation=self.dilation_rate[0]) out_width = conv_utils.deconv_output_length(width, kernel_w, padding=self.padding, output_padding=out_pad_w, stride=stride_w, dilation=self.dilation_rate[1]) if self.data_format == 'channels_first': output_shape = (batch_size, self.filters, out_height, out_width) else: output_shape = (batch_size, out_height, out_width, self.filters) output_shape_tensor = array_ops.stack(output_shape) outputs = K.conv2d_transpose( inputs, new_kernel, output_shape_tensor, strides=self.strides, padding=self.padding, data_format=self.data_format, dilation_rate=self.dilation_rate) if not context.executing_eagerly(): out_shape = self.compute_output_shape(inputs.shape) outputs.set_shape(out_shape) if self.use_bias: outputs = tf.nn.bias_add( outputs, self.bias, data_format=conv_utils.convert_data_format(self.data_format, ndim=4)) if self.activation is not None: return self.activation(outputs) return outputs class DenseSN(Dense): def build(self, input_shape): super(DenseSN, self).build(input_shape) self.u = self.add_weight(self.name + '_u', shape=tuple([1, self.kernel.shape.as_list()[-1]]), initializer=tf.initializers.RandomNormal(0, 1), trainable=False) def compute_spectral_norm(self, W, new_u, W_shape): new_v = l2normalize(tf.matmul(new_u, tf.transpose(W))) new_u = l2normalize(tf.matmul(new_v, W)) sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u)) W_bar = W/sigma with tf.control_dependencies([self.u.assign(new_u)]): W_bar = tf.reshape(W_bar, W_shape) return W_bar def call(self, inputs): W_shape = self.kernel.shape.as_list() W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1])) new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape) rank = len(inputs.shape) if rank > 2: outputs = standard_ops.tensordot(inputs, new_kernel, [[rank - 1], [0]]) if not context.executing_eagerly(): shape = inputs.shape.as_list() output_shape = shape[:-1] + [self.units] outputs.set_shape(output_shape) else: inputs = math_ops.cast(inputs, self._compute_dtype) if K.is_sparse(inputs): outputs = sparse_ops.sparse_tensor_dense_matmul(inputs, new_kernel) else: outputs = gen_math_ops.mat_mul(inputs, new_kernel) if self.use_bias: outputs = tf.nn.bias_add(outputs, self.bias) if self.activation is not None: return self.activation(outputs) return outputs #Networks Architecture init = tf.keras.initializers.he_uniform() def conv2d(layer_input, filters, kernel_size=4, strides=2, padding='same', leaky=True, bnorm=True, sn=True): if leaky: Activ = LeakyReLU(alpha=0.2) else: Activ = ReLU() if sn: d = ConvSN2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_initializer=init, use_bias=False)(layer_input) else: d = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_initializer=init, use_bias=False)(layer_input) if bnorm: d = BatchNormalization()(d) d = Activ(d) return d def deconv2d(layer_input, layer_res, filters, kernel_size=4, conc=True, scalev=False, bnorm=True, up=True, padding='same', strides=2): if up: u = UpSampling2D((1,2))(layer_input) u = ConvSN2D(filters, kernel_size, strides=(1,1), kernel_initializer=init, use_bias=False, padding=padding)(u) else: u = ConvSN2DTranspose(filters, kernel_size, strides=strides, kernel_initializer=init, use_bias=False, padding=padding)(layer_input) if bnorm: u = BatchNormalization()(u) u = LeakyReLU(alpha=0.2)(u) if conc: u = Concatenate()([u,layer_res]) return u #Extract function: splitting spectrograms def extract_image(im): im1 = Cropping2D(((0,0), (0, 2*(im.shape[2]//3))))(im) im2 = Cropping2D(((0,0), (im.shape[2]//3,im.shape[2]//3)))(im) im3 = Cropping2D(((0,0), (2*(im.shape[2]//3), 0)))(im) return im1,im2,im3 #Assemble function: concatenating spectrograms def assemble_image(lsim): im1,im2,im3 = lsim imh = Concatenate(2)([im1,im2,im3]) return imh #U-NET style architecture def build_generator(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) #downscaling g0 = tf.keras.layers.ZeroPadding2D((0,1))(inp) g1 = conv2d(g0, filters, kernel_size=(h,3), strides=1, padding='valid') g2 = conv2d(g1, filters, kernel_size=(1,9), strides=(1,2)) g3 = conv2d(g2, filters, kernel_size=(1,7), strides=(1,2)) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2)) #upscaling #g5 = deconv2d(g4,g3, 128, kernel_size=(1,2), strides=(1,2)) g4 = deconv2d(g3,g2, filters, kernel_size=(1,7), strides=(1,2)) g5 = deconv2d(g4,g1, filters, kernel_size=(1,9), strides=(1,2), bnorm=False) g6 = ConvSN2DTranspose(1, kernel_size=(h,1), strides=(1,1), kernel_initializer=init, padding='valid', activation='tanh')(g5) return Model(inp,g6, name='G') #Siamese Network def build_siamese(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) g1 = conv2d(inp, filters, kernel_size=(h,3), strides=1, padding='valid', sn=False) g2 = conv2d(g1, filters, kernel_size=(1,9), strides=(1,2), sn=False) g3 = conv2d(g2, filters, kernel_size=(1,7), strides=(1,2), sn=False) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2), sn=False) g4 = Flatten()(g3) g5 = Dense(vec_len)(g4) return Model(inp, g5, name='S') #Discriminator (Critic) Network def build_critic(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) g1 = conv2d(inp, filters*2, kernel_size=(h,3), strides=1, padding='valid', bnorm=False) g2 = conv2d(g1, filters*2, kernel_size=(1,9), strides=(1,2), bnorm=False) g3 = conv2d(g2, filters*2, kernel_size=(1,7), strides=(1,2), bnorm=False) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2), bnorm=False) g5 = Flatten()(g3) g6 = DenseSN(1, kernel_initializer=init)(g5) return Model(inp, g6, name='C') #Load past models from path to resume training or test def load(path): gen = build_generator((hop,shape,1)) siam = build_siamese((hop,shape,1)) critic = build_critic((hop,3*shape,1)) gen.load_weights(path+'/gen.h5') critic.load_weights(path+'/critic.h5') siam.load_weights(path+'/siam.h5') return gen,critic,siam #Build models def build(): gen = build_generator((hop,shape,1)) siam = build_siamese((hop,shape,1)) critic = build_critic((hop,3*shape,1)) #the discriminator accepts as input spectrograms of triple the width of those generated by the generator return gen,critic,siam #Generate a random batch to display current training results def testgena(): sw = True while sw: a = np.random.choice(aspec) if a.shape[1]//shape!=1: sw=False dsa = [] if a.shape[1]//shape>6: num=6 else: num=a.shape[1]//shape rn = np.random.randint(a.shape[1]-(num*shape)) for i in range(num): im = a[:,rn+(i*shape):rn+(i*shape)+shape] im = np.reshape(im, (im.shape[0],im.shape[1],1)) dsa.append(im) return np.array(dsa, dtype=np.float32) #Show results mid-training return abwv def save_test_image_full(path,sample1=sampleFile): #speca = tospeclong(sample1) #specb = tospeclong(sample2) awv = audio_array(sample1) aspec = tospec(awv) adata = splitcut(aspec) i = 0 for x in adata: print('Generating Samples'+str(i)) i = i+1 towave2(x, i,name=str('File1'),path=path) name = str('File1') loadPath1 = f'{path}/{name}' convertWaves(loadPath1,path,str('File1.wav')) #plt.show() def save_test_image_fullOriginal(path): a = testgena() print(a.shape) ab = gen(a, training=False) ab = testass(ab) a = testass(a) abwv = deprep(ab) awv = deprep(a) sf.write(path+'/new_file.wav', abwv, sr) # IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr)) # IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr)) fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show() #Save in training loop def save_end(epoch,gloss,closs,mloss,n_save=3,save_path='./ai/'): #use custom save_path (i.e. Drive '../content/drive/My Drive/') if epoch % n_save == 0: print('Saving...') path = f'{save_path}/MELGANVC-{str(epoch)}-{str(gloss)[:9]}-{str(closs)[:9]}-{str(mloss)[:9]}' ensure_dir(path) gen.save_weights(path+'/gen.h5') critic.save_weights(path+'/critic.h5') siam.save_weights(path+'/siam.h5') save_test_image_full(path) #Losses def mae(x,y): return tf.reduce_mean(tf.abs(x-y)) def mse(x,y): return tf.reduce_mean((x-y)**2) def loss_travel(sa,sab,sa1,sab1): l1 = tf.reduce_mean(((sa-sa1) - (sab-sab1))**2) l2 = tf.reduce_mean(tf.reduce_sum(-(tf.nn.l2_normalize(sa-sa1, axis=[-1]) * tf.nn.l2_normalize(sab-sab1, axis=[-1])), axis=-1)) return l1+l2 def loss_siamese(sa,sa1): logits = tf.sqrt(tf.reduce_sum((sa-sa1)**2, axis=-1, keepdims=True)) return tf.reduce_mean(tf.square(tf.maximum((delta - logits), 0))) def d_loss_f(fake): return tf.reduce_mean(tf.maximum(1 + fake, 0)) def d_loss_r(real): return tf.reduce_mean(tf.maximum(1 - real, 0)) def g_loss_f(fake): return tf.reduce_mean(- fake) #Get models and optimizers def get_networks(shape, load_model=False, path=None): if not load_model: gen,critic,siam = build() else: gen,critic,siam = load(path) print('Built networks') opt_gen = Adam(0.0001, 0.5) opt_disc = Adam(0.0001, 0.5) return gen,critic,siam, [opt_gen,opt_disc] #Set learning rate def update_lr(lr): opt_gen.learning_rate = lr opt_disc.learning_rate = lr #Training Functions #Train Generator, Siamese and Critic def train_all(a,b): #splitting spectrogram in 3 parts aa,aa2,aa3 = extract_image(a) bb,bb2,bb3 = extract_image(b) with tf.GradientTape() as tape_gen, tf.GradientTape() as tape_disc: #translating A to B fab = gen(aa, training=True) fab2 = gen(aa2, training=True) fab3 = gen(aa3, training=True) #identity mapping B to B COMMENT THESE 3 LINES IF THE IDENTITY LOSS TERM IS NOT NEEDED fid = gen(bb, training=True) fid2 = gen(bb2, training=True) fid3 = gen(bb3, training=True) #concatenate/assemble converted spectrograms fabtot = assemble_image([fab,fab2,fab3]) #feed concatenated spectrograms to critic cab = critic(fabtot, training=True) cb = critic(b, training=True) #feed 2 pairs (A,G(A)) extracted spectrograms to Siamese sab = siam(fab, training=True) sab2 = siam(fab3, training=True) sa = siam(aa, training=True) sa2 = siam(aa3, training=True) #identity mapping loss loss_id = (mae(bb,fid)+mae(bb2,fid2)+mae(bb3,fid3))/3. #loss_id = 0. IF THE IDENTITY LOSS TERM IS NOT NEEDED #travel loss loss_m = loss_travel(sa,sab,sa2,sab2)+loss_siamese(sa,sa2) #generator and critic losses loss_g = g_loss_f(cab) loss_dr = d_loss_r(cb) loss_df = d_loss_f(cab) loss_d = (loss_dr+loss_df)/2. #generator+siamese total loss lossgtot = loss_g+10.*loss_m+0.5*loss_id #CHANGE LOSS WEIGHTS HERE (COMMENT OUT +w*loss_id IF THE IDENTITY LOSS TERM IS NOT NEEDED) #computing and applying gradients grad_gen = tape_gen.gradient(lossgtot, gen.trainable_variables+siam.trainable_variables) opt_gen.apply_gradients(zip(grad_gen, gen.trainable_variables+siam.trainable_variables)) grad_disc = tape_disc.gradient(loss_d, critic.trainable_variables) opt_disc.apply_gradients(zip(grad_disc, critic.trainable_variables)) return loss_dr,loss_df,loss_g,loss_id #Train Critic only def train_d(a,b): aa,aa2,aa3 = extract_image(a) with tf.GradientTape() as tape_disc: fab = gen(aa, training=True) fab2 = gen(aa2, training=True) fab3 = gen(aa3, training=True) fabtot = assemble_image([fab,fab2,fab3]) cab = critic(fabtot, training=True) cb = critic(b, training=True) loss_dr = d_loss_r(cb) loss_df = d_loss_f(cab) loss_d = (loss_dr+loss_df)/2. grad_disc = tape_disc.gradient(loss_d, critic.trainable_variables) opt_disc.apply_gradients(zip(grad_disc, critic.trainable_variables)) return loss_dr,loss_df #Assembling generated Spectrogram chunks into final Spectrogram def specass(a,spec): but=False con = np.array([]) nim = a.shape[0] for i in range(nim-1): im = a[i] im = np.squeeze(im) if not but: con=im but=True else: con = np.concatenate((con,im), axis=1) diff = spec.shape[1]-(nim*shape) a = np.squeeze(a) con = np.concatenate((con,a[-1,:,-diff:]), axis=1) return np.squeeze(con) #Splitting input spectrogram into different chunks to feed to the generator def chopspec(spec): dsa=[] for i in range(spec.shape[1]//shape): im = spec[:,i*shape:i*shape+shape] im = np.reshape(im, (im.shape[0],im.shape[1],1)) dsa.append(im) imlast = spec[:,-shape:] imlast = np.reshape(imlast, (imlast.shape[0],imlast.shape[1],1)) dsa.append(imlast) return np.array(dsa, dtype=np.float32) #Converting from source Spectrogram to target Spectrogram def towave(spec, name, path='./ai/', show=False): specarr = chopspec(spec) print("ToWav") print(specarr.shape) a = specarr print('Generating...') ab = gen(a, training=False) print('Assembling and Converting...') a = specass(a,spec) ab = specass(ab,spec) awv = deprep(a) abwv = deprep(ab) print('Saving...') pathfin = f'{path}/{name}' ensure_dir(pathfin) sf.write(pathfin+'/AB.wav', abwv, sr) sf.write(pathfin+'/A.wav', awv, sr) print('Saved WAV!') #IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr)) #IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr)) if show: fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show() return abwv # Fix for long files def towave2(spec,i, name, path='./ai/',show=False): if spec is None: return specarr = chopspec(spec) print("ToWav") print(specarr.shape) a = specarr print('Generating...') ab = gen(a, training=False) print('Assembling and Converting...') #a = specass(a,spec) ab = specass(ab,spec) #awv = deprep(a) abwv = deprep(ab) print('Saving to Path '+str(path)+ ' ...') pathfin = f'{path}/{name}' # print(pathfin) ensure_dir(pathfin) if(i<100): if(i<10): sf.write(pathfin+'/AB00'+str(i)+'.wav', abwv, sr) else: sf.write(pathfin+'/AB0'+str(i)+'.wav', abwv, sr) #sf.write(pathfin+'/A'+str(i)+'.wav', awv, sr) else: sf.write(pathfin+'/AB'+str(i)+'.wav',abwv,sr) print('Saved WAV!') #IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr)) #IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr)) if show: fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show() return abwv #Training Loop def train(epochs, batch_size=16, lr=0.0001, n_save=6, gupt=5): update_lr(lr) df_list = [] dr_list = [] g_list = [] id_list = [] c = 0 g = 0 # save_test_image_full('./') for epoch in range(1,epochs,1): bef = time.time() for batchi,(a,b) in enumerate(zip(dsa,dsb)): if batchi%gupt==0: dloss_t,dloss_f,gloss,idloss = train_all(a,b) else: dloss_t,dloss_f = train_d(a,b) df_list.append(dloss_f) dr_list.append(dloss_t) g_list.append(gloss) id_list.append(idloss) c += 1 g += 1 if batchi%600==0: print(f'[Epoch {epoch}/{epochs}] [Batch {batchi}] [D loss f: {np.mean(df_list[-g:], axis=0)} ', end='') print(f'r: {np.mean(dr_list[-g:], axis=0)}] ', end='') print(f'[G loss: {np.mean(g_list[-g:], axis=0)}] ', end='') print(f'[ID loss: {np.mean(id_list[-g:])}] ', end='') print(f'[LR: {lr}]') g = 0 nbatch=batchi print(f'Time/Batch {(time.time()-bef)/nbatch}') save_end(epoch,np.mean(g_list[-n_save*c:], axis=0),np.mean(df_list[-n_save*c:], axis=0),np.mean(id_list[-n_save*c:], axis=0),n_save=n_save) print(f'Mean D loss: {np.mean(df_list[-c:], axis=0)} Mean G loss: {np.mean(g_list[-c:], axis=0)} Mean ID loss: {np.mean(id_list[-c:], axis=0)}') c = 0 #Build models and initialize optimizers #If load_model=True, specify the path where the models are saved gen,critic,siam, [opt_gen,opt_disc] = get_networks(shape, load_model=loadModel, path=loadModelPath) #Training #n_save = how many epochs between each saving and displaying of results #gupt = how many discriminator updates for generator+siamese update train(epochs, batch_size=bs, lr=0.0002, n_save=n_saves, gupt=3) #After Training, use these functions to convert data with the generator and save the results #Wav to wav conversion #librosa.util.example_audio_file() #wv, sr = librosa.load('./devil.wav', sr=16000) #Load waveform #print("Librosa Loaded Sample with Shape: "+str(wv.shape)) #Waveform to Spectrogram #plt.figure(figsize=(50,1)) #Show Spectrogram #plt.imshow(np.flip(speca, axis=0), cmap=None) #plt.axis('off') #plt.show() save_test_image_full(finalResultPath,sampleFile) ```

For the sample rate you can install sox add it to your path and create a bat with the content like below: you can alter the sample rate the c = 1 is important so it is a mono sound.

for /R %%A in (*.wav) do if /i "%%~XA"==".wav" ( 
    sox "%%A" -c 1 -r 20050 "%%A"DownSampled.wav
)