k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.42k stars 408 forks source link

TTS bug #1081

Open janjanusek opened 3 months ago

janjanusek commented 3 months ago

Hello, I'm using your TTS library and must say it's very good, but I have no idea why when I instantiate OfflineTts model and run generate it returns result, but when I do it second time on the same instance I'll get error, for model I'm using this

new OfflineTtsModelConfig()
        {
            Vits = new OfflineTtsVitsModelConfig()
            {
                Tokens = Description.Tokens.FullName,
                Model = Description.OnnxModel.FullName,
                DataDir = Description.NgDataDir.FullName,
                NoiseScale = 0,
                NoiseScaleW = 0
            },
            Provider = "cpu",
            NumThreads = 1,
            Debug = 1
        },
        MaxNumSentences = 1
    }

I'm using org.k2fsa.sherpa.onnx package and tried all possible versions, also alongside I'm using as referrence within my project onnx runtime (for my other models) but that should not affect this package. My app is running on NET8, and I'm using Win11 with EU based lang.

error I'm getting on second run for any input (even the same one):

D:\a\sherpa-onnx\sherpa-onnx\sherpa-onnx/csrc/offline-tts-vits-impl.h:Generate:165 Raw text: John
2024-07-06 19:33:02.2001917 [E:onnxruntime:, sequential_executor.c
c:516 onnxruntime::ExecuteKernel] Non-zero status code returned wh
ile running GatherElements node. Name:'/dp/flows.7/GatherElements_
3' Status Message: C:\a\_work\1\s\onnxruntime\core\providers\cpu\t
ensor\gather_elements.cc:154 onnxruntime::core_impl GatherElements op: Out of range value in index tensor

worth to notice that when I create another instance it works again for single use..

csukuangfj commented 3 months ago

could you please tell us which model you are using?

janjanusek commented 3 months ago

I tried almost all piper models, no matter what I use it seem problem persists. What is the difference between them?

janjanusek commented 3 months ago

and like month ago it was working for me no problem, I changed nothing and it stopped working, only thing I had in my mind was libraries ver chages, but as I said I'm using onnx runtime and ml.llm packages alongside sherpa packages

csukuangfj commented 3 months ago

Could you post the complete code for reproducing?

We have never encountered such an issue before.

janjanusek commented 3 months ago

I'll post it in about 12 hours, thanks

janjanusek commented 3 months ago

Okay I found the way how to replicate the issue, apparently Microsoft.ML.OnnxRuntimeGenAI once installed to your project it starts to happen. You can replicate the problem by adding Microsoft.ML.OnnxRuntimeGenAI of the latest 0.3.0v into example offline-tts project and replace line with audio generate with

    OfflineTtsGeneratedAudio audio = tts.Generate(options.Text, speed, sid);
    OfflineTtsGeneratedAudio audio1 = tts.Generate(options.Text, speed, sid);
    OfflineTtsGeneratedAudio audio2 = tts.Generate(options.Text, speed, sid);
csukuangfj commented 3 months ago

apparently Microsoft.ML.OnnxRuntimeGenAI once installed to your project it starts to happen.

If you uninstall it, will it fix the issue or not?

janjanusek commented 3 months ago

absolutelly, but how it can be that by installing other library your just doesn't work. Apparently there is some dynamic binding at sherpa-onnx causing this I suppose. In my solution I need to use both libraries.

csukuangfj commented 3 months ago

sherpa-onnx also links to onnxruntime.dll

Please search onnxruntime.dll inside the sherpa-onnx package directory.

You can use sherpa-onnx's onnxruntime.dll to replace the one from Microsoft.ML.OnnxRuntimeGenAI and see if it works.

Otherwise, there are conflicts between different versions of onnxruntime.dll

janjanusek commented 3 months ago

Okay so I tried to replace onnxruntime at build for the one used by sherpa-onnx, I made sherpa-onnx working again but the PHI3 model using Microsoft.ML.OnnxRuntimeGenAI thrown following exception The requested API version [18] is not available, only API versions [1, 17] are supported in this build. Current ORT Version is: 1.17.1

So in theory approach you proposed is dirty but valid. I saw PR for 1.18.1 onnxruntime, is there any expected date when it could be available? I believe that will solve all the issues.

I understant that update your library to newest onnxruntime must be unpleasant job, but in order to maintain this awesome project it is worthy.

csukuangfj commented 3 months ago

We also want to update to the latest onnxruntime. However, the onnxruntime > 1.17.1 causes issues with sherpa-onnx.

Please see https://github.com/k2-fsa/sherpa-onnx/pull/906

Basically, you can use onnxruntime 1.18.1 to compile sherpa-onnx from source and then use the DLL you generated to replace the one you downloaded into .Net.

If you have any issues after doing this at runtime, please see

https://github.com/microsoft/onnxruntime/issues/20808#issuecomment-2131227966

We have supported so many models that we have not found time to use the above code to convert existing models one by one. That is why we have not updated onnxruntime to the latest version.

janjanusek commented 3 months ago

I'll try that today and let you know, thanks

janjanusek commented 3 months ago

I tried to build it from your fork with onnxruntime ver 18 but I've got tons of errors while doing that I believe my env does not have right dependencies? donno, look 24>onnxruntime.lib(onnxruntime_c_api.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_c_api.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(error_code.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(error_code.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(allocator.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(allocator.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_typeinfo.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_typeinfo.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(tensor_type_and_shape.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(tensor_type_and_shape.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(abi_session_options.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(abi_session_options.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_map_type_info.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_map_type_info.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_sequence_type_info.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(onnxruntime_sequence_type_info.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj 22>LINK: Warning LNK4044 : unrecognized option '/Wl,-rpath,$ORIGIN'; ignored 24>onnxruntime.lib(run_options.obj): Error LNK2038 : mismatch detected for '_ITERATOR_DEBUG_LEVEL': value '0' doesn't match value '2' in sherpa-onnx-offline-language-identification.obj 24>onnxruntime.lib(run_options.obj): Error LNK2038 : mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MTd_StaticDebug' in sherpa-onnx-offline-language-identification.obj

list goes on for a long time but for simplicity I trimmed it here. I converted all models with this simple script but still could not make all of them work (some of the actually did although when I changed speed whatever different from 1 it generated extra short audio).

import os
import onnx

def convert_model(modelPath):
    print(modelPath)
    oldModel = onnx.load(modelPath)
    upgradedModel = onnx.version_converter.convert_version(oldModel, 21)
    onnx.save(upgradedModel, modelPath)

def apply_function_to_onnx_files(directory, function):
    """
    Recursively search for all ONNX files in the given directory and apply a function to each file's full path.

    :param directory: The directory path to search in.
    :param function: The function to apply to each ONNX file path.
    """
    for root, _, files in os.walk(directory):
        for file in files:
            if file.endswith('.onnx'):
                print(f'converting: "{file}"')
                full_path = os.path.join(root, file)
                function(full_path)

apply_function_to_onnx_files('your root path', convert_model)

If you want I can update script to be able process archives as it is in https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models so it would pull onnx out convert and replace it within archive.

My question to you is, since you alredy have this done, would you mind to build appropriate dlls and publish it here once you have time? my plan is to run linux64bit, osx and win64bit platforms and I already spend waay too much time on this issue my self and have to move ahead with development.

Thanks and looking forward to your response.

janjanusek commented 3 months ago

I managed just now to work vctk model with 109 speakers I can work with that, so if you just prepare onnx 18 of sherpa-onnx version will be good enough.

I can update python script so you can automatically convert all your models to newest opset.

janjanusek commented 3 months ago

So additional informations: when you run coqui models on 18.1.0 it works, also models using only lexicon instead of ng-data dir but those are really not robust if user makes a typo so I'll use meanwhile coqiu model EN with 109 speakers until you adapt trully 18.1.0 onnxruntime.

If you want that script let me know if not you can close ticket and hopefully new version will be supported withn 1-2 months.

I would really like to use all models with 6+ speakers since size to value ratio is good in there.

csukuangfj commented 3 months ago

The latest nuget package supports onnxruntime 1.18.0. please re-try.

csukuangfj commented 3 months ago

so I'll use meanwhile coqiu model EN with 109

I suggest that you also try vits-piper-en_US-libritts_r-medium.tar.bz2

It has more than 900 speakers!

janjanusek commented 3 months ago

To make absolutelly transparent how to replicate the problem please run sherpa offline-tts in 1.10.13

I tried to run model 'vits-piper-en_GB-vctk-medium' with and without conversion of the 21 opset, it changed nothing.

I got error (opset 21):

Wrote to ./generated.wav succeeded!
2024-07-12 08:39:29.7787850 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/dp/flows.5/Expand_25' Status Message: invalid expand shape
Unhandled exception. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
   at SherpaOnnx.OfflineTts.SherpaOnnxOfflineTtsGenerate(IntPtr handle, Byte[] utf8Text, Int32 sid, Single speed)
   at SherpaOnnx.OfflineTts.Generate(String text, Single speed, Int32 speakerId)
   at OfflineTtsDemo.Run(Options options) in C:\Users\code
NET\RiderProjects\sherpa-onnx\dotnet-examples\offline-tts\Program.cs:line 161
   at OfflineTtsDemo.Main(String[] args) in C:\Users\codeN
ET\RiderProjects\sherpa-onnx\dotnet-examples\offline-tts\Program.cs:line 82

I got error (opset unchanged):

Wrote to ./generated.wav succeeded!
2024-07-12 08:42:15.4418995 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/Reshape_1' Status Message: C:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:30 onnxruntime::ReshapeHelper::ReshapeHelper i < input_shape.NumDimensions() was false. The dimension with value zero exceeds the dimension size of the input tensor.

Unhandled exception. System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
   at SherpaOnnx.OfflineTts.SherpaOnnxOfflineTtsGenerate(IntPtr handle, Byte[] utf8Text, Int32 sid, Single speed)     
   at SherpaOnnx.OfflineTts.Generate(String text, Single speed, Int32 speakerId)
   at OfflineTtsDemo.Run(Options options) in C:\Users\codeN
ET\RiderProjects\sherpa-onnx\dotnet-examples\offline-tts\Program.cs:line 161
   at OfflineTtsDemo.Main(String[] args) in C:\Users\codeNE
T\RiderProjects\sherpa-onnx\dotnet-examples\offline-tts\Program.cs:line 82

How to replicate change running single generate to generate in loop like following

for (int i = 0; i < 10; i++)
    {
      OfflineTtsGeneratedAudio audio = tts.Generate(options.Text, speed, sid);
      bool ok = audio.SaveToWaveFile(options.OutputFilename);
      if (ok)
      {
        Console.WriteLine($"Wrote to {options.OutputFilename} succeeded!");
      }
      else
      {
        Console.WriteLine($"Failed to write {options.OutputFilename}");
      }
    }

First run will pass the others will fail, same thing as when you generate speaker 0 and then 1 in sequence.

csukuangfj commented 3 months ago

Thank you for reporting it. Will look into it during the weekend.

janjanusek commented 3 months ago

Allright, the tts demo works now. I appreciate effort, but I was hoping you figure out the problem, not rollback to old onnx version.

So how to resolve this onnx runtime issue? Is that on the list any time soon?

btw I don't believe it was ever problem with opset because I also run now another model on opset 14 (due to tf conversion bug) with onnx runtime 18.1.

I really do want to go with sherpa in PROD env. Or if you have any ideas I could try, I'm eager to hear from you.

I also believe that problem can be simulated when you install Microsoft.ML.OnnxRuntimeGenAI 0.3.0 to tts example project.

janjanusek commented 3 months ago

Don't forget, first run always works.. so to me it looks there are some data from first run and that causes issue on reshape node... because creation of new instance make it works again.

csukuangfj commented 3 months ago

Don't forget, first run always works.. so to me it looks there are some data from first run and that causes issue on reshape node... because creation of new instance make it works again.

That is unexpected. I cannot understand it.

To me the model should be stateless.

janjanusek commented 3 months ago

I guess model it self yes, but I would suggest to check indempotency of c pipeline around the model. I don't know why it's happening eighter, but all traces suggest exactly that.

janjanusek commented 2 months ago

temporary I resolved issue with packed console application I'm executing in separated process which has required dependencies, it works but it also adds additional size to the app package so I'm really eager to see support for onnx runtime 18.1+

csukuangfj commented 2 months ago

so I'm really eager to see support for onnx runtime 18.1+

Sorry that I have no idea how to fix it to support onnxruntime 1.18.1

janjanusek commented 2 months ago

I know, no blame dude, it's really difficult to tackle this as it makes no sense to crash when stateless. We spent already long time on this topic. btw is it possible to create more models with many speakers in other languages? could you give me some hint where to start? I built desktop app but when I want support many languages it's can get really big with single speaker models. 🤷🏼‍♂️

csukuangfj commented 2 months ago

We support models from https://github.com/rhasspy/piper

Would you be able to take a look at the doc of piper?

Once you have a model from piper, it is straightforward to convert it to sherpa-onnx.