manhph2211 / Vietnamese-Speech-Recognition

This repo aims to build a web app that supports speech recognition system :smiley: It's simple to use and understand :smile:
MIT License
38 stars 7 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'checkpoints/best.pth' #1

Closed hphuc4244 closed 1 year ago

manhph2211 commented 1 year ago

Hi @hphuc4244,

Usually, I won't push the checkpoint file into the repo cuz they might be too heavy. However, the checkpoint of the pretrained model of is quite good so you can even utilize it and train your custom dataset: https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h

Hope this help!

Max

hphuc4244 commented 1 year ago

Chào anh. Anh có thể cho em xin riêng file checkpoint để huấn luyện. Trân trọng cảm ơn anh

Vào CN, 5 thg 3, 2023 vào lúc 12:16 Max @.***> đã viết:

Hi @hphuc4244 https://github.com/hphuc4244,

Usually, I won't push the checkpoint file into the repo cuz they might be too heavy. However, the checkpoint of the pretrained model of is quite good so you can even utilize it and train your custom dataset: https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h

Hope this help!

Max

— Reply to this email directly, view it on GitHub https://github.com/manhph2211/Vietnamese-Speech-Recognition/issues/1#issuecomment-1454993359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJKXKZUZ4AUK2WT2RDEQMN3W2QOULANCNFSM6AAAAAAVP44Y2Q . You are receiving this because you were mentioned.Message ID: @.***>

haiquy572001 commented 1 year ago

Chào anh, anh có thể cho em xin file checkpoint để huấn luyện mô hình kh ạ, em cảm ơn ạ

manhph2211 commented 1 year ago

Hi 2 bạn nha, mình finetune theo bản pretrained này nè: https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h. Bạn có thể làm nhanh như sau:

import flash
from flash.audio import SpeechRecognition, SpeechRecognitionData
import torch
import sys
sys.path.append(".")

WAV2VEC_MODELS = ["facebook/wav2vec2-base-960h", "facebook/wav2vec2-large-960h-lv60", "nguyenvulebinh/wav2vec2-base-vietnamese-250h"]

# 1. Data
datamodule = SpeechRecognitionData.from_json(
    "file",
    "text",
    train_file="train.json",
    test_file="test.json",
    batch_size=128,
)

# 2. Build the task
model = SpeechRecognition(backbone="nguyenvulebinh/wav2vec2-base-vietnamese-250h", processor_backbone = "nguyenvulebinh/wav2vec2-base-vietnamese-250h")

# # 3. Create the trainer and finetune the model if you want :)
trainer = flash.Trainer(max_epochs=5, gpus=0)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# # 4. Predict on audio files!
datamodule = SpeechRecognitionData.from_files(predict_files=["demo/assets/database_sa1_Jan08_Mar19_cleaned_utt_0000000005-1.wav"], batch_size=1)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)

# 5. Save the model!
# trainer.save_checkpoint("checkpoints/speech_recognition_model.pt")