Closed praesc closed 1 year ago
Since pytorch doesn't officially support aarch64 (see here), we also don't officially. That being said, if you do succeed in making it work. Please feel free to details the step here :)
thanks for the quick reply vincent. I tried to build the last version from source and got some issues while compiling on the Jetson Nano. However, version 0.3.0 worked out smoothly without out-of-the-box. Nvidia provides some pre-compiled wheels for their platforms, so it's straight forward to install pytorch on them. Hence, would be nice to have torchaudio as well
Would also love to see it running on aarch64 since torch is now running there already.
Installing via pyton3 setup.py install ran trough fine
Processing dependencies for torchaudio==0.6.0a0+313f4f5 Searching for torch==1.5.0 Best match: torch 1.5.0 Adding torch 1.5.0 to easy-install.pth file Installing convert-caffe2-to-onnx script to /usr/local/bin Installing convert-onnx-to-caffe2 script to /usr/local/bin
Using /home/ark626/.local/lib/python3.6/site-packages Searching for future==0.17.1 Best match: future 0.17.1 Adding future 0.17.1 to easy-install.pth file Installing futurize script to /usr/local/bin Installing pasteurize script to /usr/local/bin
Using /usr/local/lib/python3.6/dist-packages Searching for numpy==1.18.4 Best match: numpy 1.18.4 Adding numpy 1.18.4 to easy-install.pth file Installing f2py script to /usr/local/bin Installing f2py3 script to /usr/local/bin Installing f2py3.6 script to /usr/local/bin
Using /home/ark626/.local/lib/python3.6/site-packages Finished processing dependencies for torchaudio==0.6.0a0+313f4f5
But running ./packaging/build_from_source.sh $PW runs into issues.
Okay got it installed by working myself around.
=> Run the Script once until it fails Then alter the Parts like this (so libmad and lame are not overwritten)
#!/bin/bash
set -ex
# Arguments: PREFIX, specifying where to install dependencies into
PREFIX="$1"
#rm -rf /tmp/torchaudio-deps
#mkdir /tmp/torchaudio-deps
pushd /tmp/torchaudio-deps
# Curl Settings
CURL_OPTS="-L --retry 10 --connect-timeout 5 --max-time 180"
curl $CURL_OPTS -o sox-14.4.2.tar.bz2 "https://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "http://ftp.us.debian.org/debian/pool/main/l/lame/lame_3.99.5+repack1-9+b2_arm64.deb"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz"
curl $CURL_OPTS -o flac-1.3.2.tar.xz "https://downloads.sourceforge.net/project/flac/flac-src/flac-1.3.2.tar.xz"
#curl $CURL_OPTS -o libmad-0.15.1b.tar.gz "https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/libmad/0.15.1b-9ubuntu16.$
#"https://downloads.sourceforge.net/project/mad/libmad/0.15.1b/libmad-0.15.1b.tar.gz"
echo CurlDone
# unpack the dependencies
tar xfp sox-14.4.2.tar.bz2
#tar xfp lame-3.99.5.tar.gz
tar xfp flac-1.3.2.tar.xz
#tar xfp libmad-0.15.1b.tar.gz
Then replace those two config.guess files with this version: => https://svn.osgeo.org/grass/grass/tags/release_20150712_grass_6_4_5/config.guess
/tmp/torchaudio-deps/lame-3.99.5/config.guess /tmp/torchaudio-deps/libmad-0.15.1b/config.guess
Now run the script again, and it should be running trough fine.
Issue is that the config guess in the two libraries is from 2003 and super old. So it doesnt know aarch as Machine.
Hi @ark626 , I recently saw you are trying to run MelGAN-VC on a aarm64 device (Jetson Nano). I am trying as well, successfully compiled torchaudio but still can't manage to perform inference with said model. I'm wondering if you had any success with this and if you are willing to share your experience. Thanks in advance.
Yes i fit ist running. I will answere later in more Detail. In the meantime you Coupe Check the Guide in this ive written
JetsonXavierAGX
Okay so in general the guide here referes to some of the usefull things ive used: https://github.com/ark626/JetsonXavierAGX
If i recall it properly to run MelGAN-VC you needed to install torch and a specific version of tensorflow (https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing) Afterwards i needed to respect the following things:
Sometimes if Tensorflow is used with Pytorch, it can try to use all memory. This is of course bad news, because the RAM is a combined RAM for GPU and CPU. To prevent a overallocation for CUDA one can Limit the RAM CUDA uses. For TF2 this looks like this:
gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=CUDARAMINMB)])
CUDARAMINMB Is the CUDA RAM Limit in MB. The two lines above can be placed after the import of Tensorflow.
Also when using Tensorflow with Pytorch, there often is a weird error message about a block issue. This happens when Tensorflow is imported first. Make sure to import Tensorflow AFTER Pytorch, then it will work.
I didnt compile torchaudio i just used the torch installer as far as i recall, because the compiler failed every time.
I used:
torch @ file:///media/ext/dataSets/MelGANVC/torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl
-e /usr/local/lib/python3.6/dist-packages/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg
The torch is linked in my guide.(https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing)
For the torchaudio i sadly dont recall how i installed it exactly. But i will append the path where it is stored together with a archive of the folder.
Path is something like \SHODAN\pihome\usr\local\lib\python3.6\dist-packages The file which should be extracted there is: https://www.mediafire.com/file/9llme9a0ijtbu08/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg.rar/file
=> I also can tell you, that MelGANVC doesnt use the whole torchaudio stuff, so in general even some half installations will work, because they only implemented the Algorithm to convert MelScales back to Audio in audiotorch. The rest is in tensorflow.
My imports look like this;
import matplotlib.pyplot as plt
import collections
from PIL import Image
from skimage.transform import resize
import imageio
import librosa
import librosa.display
from librosa.feature import melspectrogram
import os
import time
#import IPython
#import tensorflow as tf
#os.environ['LIBROSA_CACHE_DIR'] = 'C:/Users/ark-6/tmp'
os.environ['LIBROSA_CACHE_LEVEL'] = '50'
import wave
from glob import glob
import numpy as np
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm
from functools import partial
import math
import heapq
from torchaudio.transforms import MelScale, Spectrogram
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Concatenate, Conv2D, Conv2DTranspose, GlobalAveragePooling2D, UpSampling2D, LeakyReLU, ReLU, Add, Multiply, Lambda, Dot, BatchNormalization, Activation, ZeroPadding2D, Cropping2D, Cropping1D
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import TruncatedNormal, he_normal
import tensorflow.keras.backend as K
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=10024)])
If i can help you any further dont hesitate to ask, because i remember for my Thesis it was quite a hazzle to get this working.
Hi @ark626 ,
I've managed to get the generator part of MelGAN-VC working, thanks to your help.
I'm running on a Jetson Nano Developer Kit 2GB, and I'm limiting my memory_limit=256
But now I am receiving this error, which is to do with Tensorflow it seems...Here is my log:
Built networks (196096,) (7, 512, 64, 1) Generating... 2021-02-06 16:32:03.058473: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 2021-02-06 16:32:12.095268: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2,14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-02-06 16:32:18.477486: E tensorflow/stream_executor/cuda/cuda_driver.cc:952] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497364: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497405: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497597: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497648: E tensorflow/stream_executor/stream.cc:5479] Internal: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497757: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497799: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): 2021-02-06 16:32:18.506196: W tensorflow/core/kernels/gpu_utils.cc:69] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'Failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2021-02-06 16:32:18.506314: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.545824: I tensorflow/stream_executor/stream.cc:4963] [stream=0x9418a1e0,impl=0x94189be0] did not memzero GPU location; source: 0x7ff39e55b8 2021-02-06 16:32:18.546030: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.546073: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): Traceback (most recent call last): File "melgan_generator.py", line 740, in <module> abwv = towave(speca, name=output_name, path=output_directory) #Convert and save wav File "melgan_generator.py", line 704, in towave ab = gen(a, training=False) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 717, in call convert_kwargs_to_constants=base_layer_utils.call_context().saving) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 891, in _run_internal_graph output_tensors = layer(computed_tensors, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "melgan_generator.py", line 404, in call dilation_rate=self.dilation_rate) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 4954, in conv2d_transpose data_format=tf_data_format) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2246, in conv2d_transpose name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2317, in conv2d_transpose_v2 name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1253, in conv2d_backprop_input _ops.raise_from_not_ok_status(e, name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: DNN Backward Data function launch failure : input shape([7,512,64,1]) filter shape([512,1,1,512]) [Op:Conv2DBackpropInput] name: G/conv_s_n2d_transpose/conv2d_transpose/
Any idea what this might be? Thanks so much in advance.
I think the issue here was the tf version. Try installing tensorflow==1.15.4+nv20.11 Also it seems that that the memory is a little bit to small and ran out of memory trying to allocate 2,14GiB.
You can try to make a smaller model by decreasing it somehow. (You can also try decrease the samples to make the allocation smaller.) You can do serveral runs with different models if you want to split it and reload the old saved model.
Thanks @ark626 , I will try that TF installation, also worth noting that I'm running JetPack V 4.5
, whereas you are running version 4.4.x, would that make a difference?
One last thing is that I'm running the MelGAN-VC model with these hyperparams:
hop=512 #hop size (window size = 6*hop)
sr=44100 #sampling rate
min_level_db=-100 #reference values to normalize data
ref_level_db=20
shape=64 #length of time axis of split specrograms to feed to generator
vec_len=128 #length of vector generated by siamese vector
bs = 4 #batch size
delta = 2. #constant for siamese loss
At which sample rate and with what batch size did you manage to perform inference? Do you think running the model at 16khz sample rate help in fitting it into memory?
Thanks again for your help!
to reduce the ram usage you can reduce the vec_len i.E. 64 or 32. The sampling rate sr should match the ones of your samples you use to train. Also i extracted the shape as parameter which can be reduced to 32 or 16. See the code below.
The sample rate will not help you in terms of ram. IT should fit your samples so verify that all your samples have the same sr The filters can be decreased untill 256 Batch size can be decreased until 2-3 so it will save ram.
For the sample rate you can install sox add it to your path and create a bat with the content like below: you can alter the sample rate the c = 1 is important so it is a mono sound.
for /R %%A in (*.wav) do if /i "%%~XA"==".wav" (
sox "%%A" -c 1 -r 20050 "%%A"DownSampled.wav
)
🚀 Feature
Is there a realease for aarch64?
Motivation
It'd be great to be able to deploy a pytorch module on a RPI for speech recognition
Pitch
Alternatives
Additional context