manyeyes / AliParaformerAsr

c# library for decoding paraformer, sensevoice Models,used in speech recognition (ASR)
Apache License 2.0
28 stars 3 forks source link

初始化用时比较长,保存对象后,第二次就出错,怎么解决 #8

Closed toolgood closed 4 months ago

toolgood commented 4 months ago

相关代码如下:

internal static partial class Program
{
    public static string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
    [STAThread]
    private static void Main()
    {
        string modelName = "speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx";

        var t1 = OfflineRecognizer(modelName + "/example/1.wav");
        var t2 = OfflineRecognizer(modelName + "/example/2.wav"); //出错 返回空
        var t3 = OfflineRecognizer(modelName + "/example/3.wav"); //出错 返回空
        var t4 = OfflineRecognizer(modelName + "/example/4.wav"); //出错 返回空

        //OfflineRecognizer();
        //OnlineRecognizer();
    }

    private static OfflineRecognizer offlineRecognizer;
    public static OfflineRecognizer initOfflineRecognizer()
    {
        if (offlineRecognizer == null) {
            string modelName = "speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx";
            string modelFilePath = applicationBase + "./" + modelName + "/model_quant.onnx";
            string configFilePath = applicationBase + "./" + modelName + "/asr.yaml";
            string mvnFilePath = applicationBase + "./" + modelName + "/am.mvn";
            string tokensFilePath = applicationBase + "./" + modelName + "/tokens.txt";
            offlineRecognizer = new OfflineRecognizer(modelFilePath, configFilePath, mvnFilePath, tokensFilePath);
        }
        return offlineRecognizer;
    }
    public static string OfflineRecognizer(string file)
    {
        if (!File.Exists(file)) { return ""; }
        AudioFileReader _audioFileReader = new AudioFileReader(file);
        byte[] datas = new byte[_audioFileReader.Length];
        _audioFileReader.Read(datas, 0, datas.Length);
        float[] sample = new float[datas.Length / sizeof(float)];
        Buffer.BlockCopy(datas, 0, sample, 0, datas.Length);

        string modelName = "speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx";

        OfflineRecognizer offlineRecognizer = initOfflineRecognizer();
        OfflineStream stream = offlineRecognizer.CreateOfflineStream();
        stream.AddSamples(sample);

        var result = offlineRecognizer.GetResult(stream);

        return result.Text;
    }

}
manyeyes commented 4 months ago

About “解码第二个示例音频(Decode the second file)": Your usage is correct. This bug has been fixed and merged into the main branch. Please update the code and test again. Thank you. About “初始化用时比较长(Long initialization time)”: Generally, loading the model takes about 5 seconds to complete. This is somewhat related to the device environment, perhaps you can provide more detailed information to help optimize.

toolgood commented 4 months ago

good