mbzuai-nlp / ArTST

25 stars 2 forks source link

Simple Audio to Speech #8

Closed MohammadAminRazavi closed 1 month ago

MohammadAminRazavi commented 1 month ago

Hello, I hope you are doing well. I just wanna use your framework and fine-tuned model for a simple Arabic Audio to text conversion. I've Downloaded MGB2_ASR.pt and asr_spm.model, but not sure how to use them? is below code work fine?what should be the 'path-to-folder-with-checkpoints'?

import torch from artst.tasks.artst import ArTSTTask from artst.models.artst import ArTSTTransformerModel

Load the checkpoint

checkpoint = torch.load('MGB2_ASR.pt') checkpoint['cfg']['task'].t5_task = 's2t'
checkpoint['cfg']['task'].data = 'path-to-folder-with-checkpoints'

task = ArTSTTask.setup_task(checkpoint['cfg']['task']) model = ArTSTTransformerModel.build_model(checkpoint['cfg']['model'], task) model.load_state_dict(checkpoint['model'])

import librosa audio_input, sample_rate = librosa.load("path_to_your_audio_file.wav", sr=16000)

from transformers import Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("path_to_your_local_processor") input_values = processor(audio_input, return_tensors="pt", sampling_rate=16000).input_values

Theehawau commented 1 month ago

See the demo notebook.

https://github.com/mbzuai-nlp/ArTST/blob/main/demo-artst-asr.ipynb