kkoutini / PaSST

Efficient Training of Audio Transformers with Patchout
Apache License 2.0
305 stars 50 forks source link

audio inference #6

Open dagongji10 opened 2 years ago

dagongji10 commented 2 years ago

@kkoutini Thanks for sharing nice work. I want to know how to read an audio file and do full inference. Can you show me the example? How to do preprocess?

kkoutini commented 2 years ago

Hi! for inference only we prepared this repo: https://github.com/kkoutini/passt_hear21 you can install it:

pip install -e 'git+https://github.com/kkoutini/passt_hear21@0.0.9#egg=hear21passt' 

then use it for inference:

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

model = load_model(mode="logits").cuda()
logits = model(wave_signal)
dagongji10 commented 2 years ago

In fact, I have tried passt_hear21 to do inference. But in the example, the input is not audio file. My question is if I have audio file, how can I use it as a correct input? In other words, how to get wave_singal above?

import torch

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

wave_signal, sr = torchaudio.load("test_audio.wav")
model = load_model(mode="logits").cuda()
logits = model(wave_signal)

Is that right? Any other preprocess need I do?

kkoutini commented 2 years ago

That's correct. you just need to make sure that the signal has 32k sampling rate.