Closed hbredin closed 3 years ago
This might also be a good time to switch to upcoming torchaudio default sox_io backend.
cc @mogwai wanna have a look?
Related to https://github.com/pytorch/audio/pull/1158
torchaudio nightly seems to support this already:
pip install --pre torchaudio -f https://download.pytorch.org/whl/nightly/torch_nightly.html
Do you want to wait to do this when the torchaudio version comes out where sox_io is the default backend?
sox_io
is already available in 0.7 so I guess we do not have to wait to make the switch.
It becomes the default in torchaudio 0.8.0 for linux or osx. I remember that soundfile was faster in my benchmarks. Maybe it isn't in the newer versions.
We're back to soundfile for the increased speeds.
So essentially you want to be able to read from streams with the io module?
Yes.
This will be supportable in torchaudio 0.8.0 which is to be released fairly soon?
pip install --upgrade --pre torchaudio -f https://download.pytorch.org/whl/nightly/torch_nightly.html
sudo apt install libncurses5
Then you can use this to test it:
import torchaudio
torchaudio.set_audio_backend("soundfile")
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
with open('tests/data/dev00.wav', 'rb') as f:
wav, sr = torchaudio.load(f)
print(wav.shape, sr)
Thanks for looking into this.
However, I think (but maybe I am wrong) that switching to torchaudio 0.8.0
will not be enough.
We also have to support file-like objects everywhere AudioFile
is used (and in pyannote.audio.core.io
in particular).
We should be able to implement this with torchaudio 0.8.0. It might mean breaking changing for the AudioFile
io API. I found these relavent PR's/Issues:
https://github.com/pytorch/audio/issues/1072 https://github.com/pytorch/audio/pull/1158
Hi
I am trying to push the file-like object support included in 0.8.0 release, but some tests are failing randomly and I am still trying to figure out why. I am keeping the record https://github.com/pytorch/audio/issues/1229.
The thing is that for some audio formats, the decoding finishes before it loads all the available data so it does not return the expected number of frames. This kind of bug is hard to notice for end users yet it could have devastating impact. (like getting wrong evaluation numbers)
So there is a good chance that I have to remove the file-like object support from 0.8.0.
Regarding https://github.com/pytorch/audio/issues/1072, it is more like survey as we want to do something towards streaming support but we are not sure where to start. If you have a thought, feel free to leave your comments there. We appreciate any feedback.
Thanks @mthrok for letting us know.
Update on the file-like object support. I resolved the issue, so the feature will be included in 0.8.0 release, which is scheduled to happen the next week.
Awesome. Thanks for the update @mthrok!
torch 0.8.0 is released so this can be implemented
Great. Let me know if you need help or find a bug on `torchaudio side.
Hi,
I am trying to calculate embeddings of given audio using XVectorMFCC VoxCeleb model (https://huggingface.co/hbredin/SpeakerEmbedding-XVectorMFCC-VoxCeleb). My input is an mp3 byte encoded string. Since the embeddings model takes only wav input. I converted it into wav BytesIO object. However, the model is not able to take wav BytesIO object. If I write the BytesIO wav into a file and provide it as an input to the model - it is able to provide embeddings (I am trying to avoid I/O operation here)
Could anyone help me with this? I am providing packages installed and code used.
Requirements: (pyannote) sh-4.2$ pip freeze | grep torch pytorch-lightning==1.2.7 pytorch-metric-learning==0.9.98 torch==1.8.1 torch-audiomentations==0.6.0torchaudio==0.8.1 torchmetrics==0.2.0 torchvision==0.9.1(pyannote) sh-4.2$ pip freeze | grep pyannotepyannote.audio @ https://github.com/pyannote/pyannote-audio/archive/develop.zip pyannote.core==4.1 pyannote.database==4.1 pyannote.metrics==3.0.1 pyannote.pipeline==2.0
Code: import ast import io from io import StringIO
from pydub import AudioSegment from pyannote.audio import Inference
model = Inference("hbredin/SpeakerEmbedding-XVectorMFCC-VoxCeleb", device="cpu", window="whole")
audio_bytes=ast.literal_eval(data) aud=AudioSegment.from_mp3(io.BytesIO(audio_bytes))
outputStream = io.BytesIO() aud.set_frame_rate(16000)[:5000].export(outputStream, format="wav")
embeddings = model(outputStream) #Failing here, It works if I write wav into a file and provide it as a input. print(embeddings[0])
Closing this issue as PR #640 has just been merged.
@karthikgali, as long as soundfile
supports mp3 (I did not check), it means that you can now do something loke:
from pyannote.audio import Inference
inference = Inference("hbredin/SpeakerEmbedding-XVectorMFCC-VoxCeleb", window="whole")
from pyannote.core import Segment
with open('audio.mp3', 'rb') as fp:
embedding = inference.crop(fp, Segment(3, 5))
This is not currently supported:
One has to do this instead:
This is a limitation that might be problematic (e.g. with
streamlit.file_uploader
that returns a file handle)