openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
678 stars 114 forks source link

inference code #162

Closed JaeungHyun closed 2 years ago

JaeungHyun commented 2 years ago

πŸš€ Feature request

데이터셋을 μƒμ„±ν•΄μ„œ μΈνΌλŸ°μŠ€ν•˜λ‹ˆκΉŒ μƒλ‹Ήνžˆ 느린데

μ˜€λ””μ˜€ νŒŒμΌμ„ λ°”λ‘œ librosa둜 signal μΆ”μΆœν•΄μ„œ μΈνΌλŸ°μŠ€ν•΄μ„œ 결과만 λ°›λŠ” μ½”λ“œλŠ” μ—†μ„κΉŒμš”?

kospeech 에 inference μ½”λ“œλ₯Ό μ°Έκ³ ν•΄λ΄€λŠ”λ° λͺ¨λΈμ΄ μ •μƒμ μœΌλ‘œ λ™μž‘ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

Motivation

Your contribution

def transform_input(signal):
    melspectrogram = librosa.feature.melspectrogram(
            y=signal,
            sr=configs['audio']['sample_rate'],
            n_mels=configs['audio']['num_mels'],
            n_fft=n_fft,
            hop_length=hop_length,
        )
    melspectrogram = librosa.power_to_db(melspectrogram, ref=np.max)
    return melspectrogram
def parse_audio(filepath: str) -> Tensor:

    signal, sr = librosa.load(filepath, sr=None)
    signal = librosa.resample(signal, orig_sr=sr, target_sr=16000)
    feature = transform_input(signal)

    feature -= feature.mean()
    feature /= np.std(feature)

    feature = torch.FloatTensor(feature).transpose(0, 1)
    print(feature.shape)

    return feature

def inference(feature):
    with torch.no_grad():
        outputs = model(feature.unsqueeze(0), torch.Tensor([1]).to('cuda'))
    print(outputs)
    prediction = tokenizer.decode(outputs["predictions"].cpu().detach().numpy())
    print(prediction)

    return prediction

@app.post("/upload")
async def upload(file: UploadFile = File(...)):    
    filepath = save_data(file)

    # load file
    feature = parse_audio(filepath)

    feature = feature.to('cuda')

    prediction = inference(feature)
    os.remove(filepath)

    return {'prediction': prediction}
JaeungHyun commented 2 years ago

λ°μ΄ν„°λ‘œλ”λ‘œ κ΅¬μ„±ν•΄μ„œ λ“€μ–΄κ°€λŠ” μΈν’‹ν•˜κ³ 

μ œκ°€ 직접 μ „μ²˜λ¦¬ν•΄μ„œ λ‚˜μ˜€λŠ” μΈν’‹ν•˜κ³  같은 것 κΉŒμ§€λŠ” ν™•μΈν–ˆλŠ”λ°

λͺ¨λΈμ— λ„£κ³  κ²°κ³ΌλŠ” λ‹¬λΌμ§‘λ‹ˆλ‹€.

JaeungHyun commented 2 years ago

ν•΄κ²°ν–ˆμŠ΅λ‹ˆλ‹€ :)

rkskekzzz commented 2 years ago

μ•ˆλ…•ν•˜μ„Έμš”! ν˜Ήμ‹œ μ–΄λ–€ ν•™μŠ΅ λͺ¨λΈ μ‚¬μš©ν•˜μ…¨λ‚˜μš”? μ €λŠ” rnn_transducer model을 μ‚¬μš©ν•΄μ„œ inference μ½”λ“œλ₯Ό λ§Œλ“€κ³  μžˆλŠ”λ° κ²°κ³Όκ°€ 잘 μ•ˆλ‚˜μ™€μ„œ λ¬Έμ˜λ“œλ¦½λ‹ˆλ‹€!

JaeungHyun commented 2 years ago

@rkskekzzz

outputs = model(feature.unsqueeze(0), torch.Tensor([feature.shape[0]]).to('cuda'))

μ΄λ ‡κ²Œ ν•˜λ‹ˆκΉŒ κ²°κ³Όκ°€ μ œλŒ€λ‘œ λ‚˜μ™”μŠ΅λ‹ˆλ‹€