google / lyra

A Very Low-Bitrate Codec for Speech Compression
Apache License 2.0
3.83k stars 354 forks source link

lyra_android_example.apk crashed. #147

Open TangYuFan opened 4 months ago

TangYuFan commented 4 months ago

Hello, the Android example compiled according to the readme file runs well on some phones and crashes on some phones。

The compilation script I tried: (1) bazel build -c opt lyra/android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK (2) bazel build -c opt lyra/android_example:lyra_android_example --fat_apk_cpu=armeabi-v7a,arm64-v8a --copt=-DBENCHMARK

Running apk on some phones has crashed. The following is the error log: image

May I ask how to solve the above problems? Is compiling a script requiring special instructions? Because I use encoding and decoding interfaces to package it into SO for Android phones, some phones may not run properly。

JianhaoPeng commented 2 months ago

@TangYuFan Hi, I also encountered this problem, have you solved this yet? I tried upgrade the version of sdk, but the compiling process failed, only version 30 can build successfully. Besides, is this apk can be adapted to android 15? Thank you so much.

TangYuFan commented 2 months ago

My colleague solved this problem, probably because there is an issue with the destructor: 6b9bfc087a008363defb72bebfcf682d

JianhaoPeng commented 2 months ago

@TangYuFan Hi, thanks, I found this line in DenseStorge.h, should I comment out this line, can you tell me how do you solve this? By the way, I found that the app didn't crash after I commented out the encode and decode methods in jni_lyra_benchmark_lib.cpp. Can we use email to discuss it, i struggle with this problem for weeks, my email is 2580979439@qq.com, thanks a million.

TangYuFan commented 2 months ago

My colleague found through GDB debugging that it is hanging on this destructor. Sorry, I don't know the specific modifications. My algorithm colleague fixed the problem and directly provided me with the .so dynamic library file.

JianhaoPeng commented 2 months ago

We're trying to implement the encode and decode method separately using the .so dynamic library file, can you share it? Really appreciate your help.

TangYuFan commented 2 months ago

You can obtain the Android SO library file through the following methods: The idea comes from another issue image

  1. Add codec dependency to android_example / BUILD deps = [ "//lyra:lyra_encoder", "//lyra:lyra_decoder", ] 2.Add the codec jni function to jni_lyra_benchmarked_lib.cc

include "lyra/lyra_encoder.h"

include "lyra/lyra_decoder.h"

namespace { std::unique_ptr encoder; std::unique_ptr decoder; }

Next, you can call Lyra to provide JNI functions such as initialization, release, encoding, decoding, etc like this: extern "C" JNIEXPORT int JNICALL Java_com_lyra_LyraCodec_codecInit(JNIEnv env, jobject this_obj, jint sampleRateHz, jint numChannels, jint bitrate, jboolean enableDtx, jstring modelPath){ const char modelPathCStr = env->GetStringUTFChars(modelPath, nullptr); if (modelPathCStr == nullptr) { return 0; } ghc::filesystem::path model(modelPathCStr); env->ReleaseStringUTFChars(modelPath, modelPathCStr); encoder = chromemedia::codec::LyraEncoder::Create(sampleRateHz, numChannels, bitrate, enableDtx, model); decoder = chromemedia::codec::LyraDecoder::Create(sampleRateHz, numChannels, model); return (encoder != nullptr) && (decoder != nullptr); }

  1. Compile android_deample and extract so file
smallpotato85 commented 1 month ago

Hi @TangYuFan Could you describe your solution with more details? I have tried your suggestion but it is not working. It looks like your approach is just same as using "jni_lyra_benchmark_lib.cc" that is already suggested in BUILD source code. I made my own library through "my_lyra_lib.cc". And because I want to use chromemedia::codec::DecodeFeatures() & chromemedia::codec::EncodeWav, I called those functions on "my_lyra_lib.cc" and hooked dependencies on //lyra/cli_example:decoder_main_lib & encoder_main_lib. And built those ones by specifying cc_library in the BUILD file. (Sorry I have not understand how .cc files can be libraries without cc_library grammer) And then set & build void MainActivity.java through android_library() with dependency on my_lyra_lib. And finally I built my own lyra_android_example.so through android_binary() with dependency on the library.

Please check my modification and tell me which is wrong. I wasted a couple of weeks for this problem... Really need your help.

Hi @JianhaoPeng It looks like you have got clue from TangYuFan's comment and got a solution. Could you share your solution how to build the .so ?

Many thanks in advance

smallpotato85 commented 3 weeks ago

I found a solution but it's very hacky one. I don't know the details, but anyway it looks like the _Unwind_Backtrace() has some problem. I disassembled liblyra_android_example.so and made the _Unwind_Backtrace() as void function. (return immediately after calling the function) After this, the crash disappeared and I could do Dec & Enc through Lyra as I wanted.

By searching, it looks like the function is related to process of exception in general code. I also tried to find some option to disable such exceptional processing related the function but also failed.

TangYuFan commented 1 week ago

Another approach is to port Lyra's model to my other cmake project and use the latest TensorFlow Lite version 2.19.0 for inference. Then, I will compile it onto Android using NDK to eliminate issues beyond model inference. Can you please let me know if this is helpful: this demo code:

include "iostream"

include <tensorflow/lite/version.h>

include <tensorflow/lite/model.h>

include <tensorflow/lite/kernels/register.h>

include <tensorflow/lite/string_util.h>

include

include <tensorflow/lite/signature_runner.h>

include

void readWavWith320Point(const std::string& wav_in, std::vector<std::vector>& frames) { std::cout << "---------------------------------------------------" << std::endl; std::cout << "WAV: " << wav_in << std::endl; std::ifstream in_file(wav_in, std::ios::binary); if (!in_file) { std::cerr << "Error opening input file." << std::endl; return; } char riff_header[12]; in_file.read(riff_header, sizeof(riff_header)); char fmt_chunk[24]; in_file.read(fmt_chunk, sizeof(fmt_chunk)); if (std::strncmp(fmt_chunk, "fmt ", 4) != 0) { std::cerr << "Invalid fmt chunk." << std::endl; return; } uint16_t audio_format = reinterpret_cast<uint16_t>(fmt_chunk + 8); uint16_t num_channels = reinterpret_cast<uint16_t>(fmt_chunk + 10); uint32_t sample_rate = reinterpret_cast<uint32_t>(fmt_chunk + 12); uint16_t bits_per_sample = reinterpret_cast<uint16_t>(fmt_chunk + 22); std::cout << "audio_format: " << audio_format << std::endl; std::cout << "num_channels: " << num_channels << std::endl; std::cout << "sample_rate: " << sample_rate << std::endl; std::cout << "bits_per_sample: " << bits_per_sample << std::endl; char chunk_id[4]; uint32_t chunk_size; while (in_file.read(chunk_id, sizeof(chunk_id))) { in_file.read(reinterpret_cast<char>(&chunk_size), sizeof(chunk_size)); if (std::strncmp(chunk_id, "data", 4) == 0) { break; } in_file.seekg(chunk_size, std::ios::cur); // 跳过当前块 } std::vector audio_data(chunk_size / sizeof(int16_t)); in_file.read(reinterpret_cast<char>(audio_data.data()), chunk_size); in_file.close(); double duration = static_cast(audio_data.size()) / (sample_rate num_channels); std::cout << "Frame len: " << audio_data.size() << std::endl; std::cout << "Audio duration: " << duration << " seconds" << std::endl; const int frame_size = 320; int num_frames = audio_data.size() / frame_size; std::cout << "Number of frames: " << num_frames << std::endl; frames.clear();
for (int i = 0; i < num_frames; ++i) { std::vector frame(audio_data.begin() + i
frame_size, audio_data.begin() + (i + 1) * frame_size); frames.push_back(frame); } }

void write_wav(const std::string& filename, const std::vector& audio_data, uint32_t sample_rate, uint16_t num_channels) { std::ofstream out_file(filename, std::ios::binary | std::ios::trunc); // 添加 std::ios::trunc 确保覆盖文件 if (!out_file) { std::cerr << "Error opening output file." << std::endl; return; } uint32_t data_size = audio_data.size() sizeof(int16_t); // Write WAV header out_file.write("RIFF", 4); uint32_t chunk_size = 36 + data_size; out_file.write(reinterpret_cast<const char>(&chunk_size), 4); out_file.write("WAVE", 4); out_file.write("fmt ", 4); uint32_t subchunk1_size = 16; out_file.write(reinterpret_cast<const char>(&subchunk1_size), 4); uint16_t audio_format = 1; // PCM out_file.write(reinterpret_cast<const char>(&audio_format), 2); out_file.write(reinterpret_cast<const char>(&num_channels), 2); out_file.write(reinterpret_cast<const char>(&sample_rate), 4); uint32_t byte_rate = sample_rate num_channels 2; out_file.write(reinterpret_cast<const char>(&byte_rate), 4); uint16_t block_align = num_channels 2; out_file.write(reinterpret_cast<const char>(&block_align), 2); uint16_t bits_per_sample = 16; out_file.write(reinterpret_cast<const char>(&bits_per_sample), 2); out_file.write("data", 4); out_file.write(reinterpret_cast<const char>(&data_size), 4); out_file.write(reinterpret_cast<const char>(audio_data.data()), data_size); }

void PrintModelInputOutputShapes(const std::string& model_name,tflite::Interpreter interpreter, const std::string& signature_name) { if (interpreter == nullptr) { std::cerr << "Failed to create interpreter!" << std::endl; return; } auto signature_runner = interpreter->GetSignatureRunner(signature_name.c_str()); if (signature_runner == nullptr) { std::cerr << "Signature Runner not found for signature: " << signature_name << std::endl; return; } std::cout << "---------------------------------------------------" << std::endl; std::cout << "Model:" << model_name << ":" << std::endl; const std::vector<const char>& input_names = signature_runner->input_names(); std::cout << "Input Tensors for signature '" << signature_name << "':" << std::endl; for (const auto& input_name : input_names) { const TfLiteTensor input_tensor = signature_runner->input_tensor(input_name); std::cout << " Tensor Name: " << input_name << std::endl; std::cout << " Shape: ["; for (int j = 0; j < input_tensor->dims->size; ++j) { std::cout << input_tensor->dims->data[j] << (j < input_tensor->dims->size - 1 ? ", " : ""); } std::cout << "]" << std::endl; } const std::vector<const char>& output_names = signature_runner->output_names(); std::cout << "Output Tensors for signature '" << signature_name << "':" << std::endl; for (const auto& output_name : output_names) { const TfLiteTensor* output_tensor = signature_runner->output_tensor(output_name); std::cout << " Tensor Name: " << output_name << std::endl; std::cout << " Shape: ["; for (int j = 0; j < output_tensor->dims->size; ++j) { std::cout << output_tensor->dims->data[j] << (j < output_tensor->dims->size - 1 ? ", " : ""); } std::cout << "]" << std::endl; } }

void infer_lyragan(tflite::Interpreter interpreter,const std::vector& features,std::vector& frame){ TfLiteTensor input_tensor = interpreter->tensor(interpreter->inputs()[0]); for (size_t i = 0; i < features.size(); ++i) { input_tensor->data.f[i] = features[i]; } if (interpreter->Invoke() != kTfLiteOk) { std::cerr << "Failed to invoke interpreter!" << std::endl; return; } TfLiteTensor output_tensor = interpreter->tensor(interpreter->outputs()[0]); if (!output_tensor) { std::cerr << "Failed to get output tensor!" << std::endl; return; } int output_size = output_tensor->dims->data[1]; frame.resize(output_size); for (int i = 0; i < output_size; ++i) { // [-1, 1],转 int16_t float restored_value = output_tensor->data.f[i]; frame[i] = static_cast(restored_value std::numeric_limits::max()); } }

void infer_soundstream(tflite::Interpreter interpreter,const std::vector& frame,std::vector& features){ TfLiteTensor input_tensor = interpreter->tensor(interpreter->inputs()[0]); for (size_t i = 0; i < frame.size(); ++i) { input_tensor->data.f[i] = static_cast(frame[i]) / std::numeric_limits().min() ; } if (interpreter->Invoke() != kTfLiteOk) { std::cerr << "Failed to invoke interpreter!" << std::endl; return; } TfLiteTensor* output_tensor = interpreter->tensor(interpreter->outputs()[0]); if (!output_tensor) { std::cerr << "Failed to get output tensor!" << std::endl; return; } // 64 个 float64 int output_size = output_tensor->dims->data[2]; // 返回 features features.resize(output_size); for (int i = 0; i < output_size; ++i) { features[i] = output_tensor->data.f[i]; } }

constexpr int kMaxNumQuantizedBits = 184; int num_bits = 64; int bits_per_quantizer = 0; void infer_quantizer_rvq_encode(tflite::Interpreter interpreter,std::vector features,std::string& quantized_features){ tflite::SignatureRunner encode_runner = interpreter->GetSignatureRunner("encode"); bits_per_quantizer = encode_runner->output_tensor("output_1")->data.i32[0]; if (num_bits % bits_per_quantizer != 0) { std::cerr << "The number of bits (" << num_bits << ") has to be divisible by the number of bits per quantizer ("<< bits_per_quantizer << ")."<< std::endl; return; } if (encode_runner->AllocateTensors() != kTfLiteOk) { std::cerr << "Failed to allocate tensors for encode runner." << std::endl; return; } float input_data = encode_runner->input_tensor("input_frames")->data.f; std::copy(features.begin(), features.end(), input_data); const int required_quantizers = num_bits / bits_per_quantizer; encode_runner->input_tensor("num_quantizers")->data.i32[0] = required_quantizers; if (encode_runner->Invoke() != kTfLiteOk) { std::cerr << "Unable to invoke the encode runner." << std::endl; return; } const int32_t nearest_neighbors = encode_runner->output_tensor("output_0")->data.i32; std::bitset quantized_bits = 0; for (int i = 0; i < required_quantizers; ++i) { quantized_bits |= std::bitset<quantized_bits.size()>(nearest_neighbors[i])<< ((required_quantizers - i - 1) * bits_per_quantizer); } quantized_features = quantized_bits.to_string().substr(kMaxNumQuantizedBits - num_bits); }

void infer_quantizer_rvq_decode(tflite::Interpreter interpreter,std::string quantized_features,std::vector& featuresOut){ tflite::SignatureRunner decode_runner = interpreter->GetSignatureRunner("decode"); const int required_quantizers = num_bits / bits_per_quantizer; const int max_num_quantizers = kMaxNumQuantizedBits / bits_per_quantizer; if (decode_runner->ResizeInputTensor("encoding_indices", {max_num_quantizers, 1, 1}) != kTfLiteOk) { std::cout << "Failed to resize the indices tensor to the required number of "<< "quantizers (" << max_num_quantizers << ")."<< std::endl; return; } if (decode_runner->AllocateTensors() != kTfLiteOk) { std::cout << "Unable to allocate tensors." << std::endl; return; } const std::bitset quantized_bits(quantized_features); const std::bitset quantizer_mask( (1 << bits_per_quantizer) - 1); int32_t indices = decode_runner->input_tensor("encoding_indices")->data.i32; for (int i = 0; i < required_quantizers; ++i) { indices[i] = static_cast(((quantized_bits >>((required_quantizers - i - 1) bits_per_quantizer)) & quantizer_mask).to_ulong()); } for (int j = required_quantizers; j < max_num_quantizers; ++j) { indices[j] = -1; } if (decode_runner->Invoke() != kTfLiteOk) { std::cout << "Unable to invoke the decode runner."<< std::endl; return; } const TfLiteTensor features_tensor = decode_runner->output_tensor("output_0"); const float features = features_tensor->data.f; const int num_features = features_tensor->bytes / sizeof(features[0]); featuresOut = std::vector(features, features + num_features); }

int main() { std::cout << "TensorFlow Lite Version: " << TFLITE_VERSION_STRING << std::endl; std::string lyragan = "/mnt/d/work/workspace/duijie2/src/lyra/lyra/model_coeffs/lyragan.tflite"; std::string soundstream_encoder = "/mnt/d/work/workspace/duijie2/src/lyra/lyra/model_coeffs/soundstream_encoder.tflite"; std::string quantizer = "/mnt/d/work/workspace/duijie2/src/lyra/lyra/model_coeffs/quantizer.tflite"; auto lyragan_model = tflite::FlatBufferModel::BuildFromFile(lyragan.c_str()); auto soundstream_encoder_model = tflite::FlatBufferModel::BuildFromFile(soundstream_encoder.c_str()); auto quantizer_model = tflite::FlatBufferModel::BuildFromFile(quantizer.c_str()); tflite::ops::builtin::BuiltinOpResolver resolver; std::unique_ptr lyragan_interpreter; tflite::InterpreterBuilder lyragan_builder(lyragan_model, resolver); lyragan_builder(&lyragan_interpreter); std::unique_ptr soundstream_interpreter; tflite::InterpreterBuilder soundstream_builder(soundstream_encoder_model, resolver); soundstream_builder(&soundstream_interpreter); std::unique_ptr quantizer_interpreter; tflite::InterpreterBuilder quantizer_builder(*quantizer_model, resolver); quantizer_builder(&quantizer_interpreter); PrintModelInputOutputShapes("Quantizer",quantizer_interpreter.get(),"encode"); PrintModelInputOutputShapes("Quantizer",quantizer_interpreter.get(),"decode"); PrintModelInputOutputShapes("Lyragan",lyragan_interpreter.get(),"serving_default"); PrintModelInputOutputShapes("Soundstreaml",soundstream_interpreter.get(),"serving_default"); std::string wav = "/mnt/d/work/workspace/duijie2/src/opencore-amr-nb/file/speech_16k.wav"; std::vector<std::vector> frames; readWavWith320Point(wav,frames); std::vector allFramesOut; std::cout << "---------------------------------------------------" << std::endl; for (const auto& frameIn : frames) { std::vector featuresIn; infer_soundstream(soundstream_interpreter.get(),frameIn,featuresIn); std::string quantized_features; infer_quantizer_rvq_encode(quantizer_interpreter.get(),featuresIn,quantized_features); std::cout << "Quantized_Features:" << quantized_features << std::endl; std::vector featuresOut; infer_quantizer_rvq_decode(quantizer_interpreter.get(),quantized_features,featuresOut); std::vector frameOut; infer_lyragan(lyragan_interpreter.get(),featuresOut,frameOut); allFramesOut.insert(allFramesOut.end(), frameOut.begin(), frameOut.end()); } std::cout << "---------------------------------------------------" << std::endl; std::cout << "Over.." << std::endl; std::string wavOut = "/mnt/d/work/workspace/duijie2/src/opencore-amr-nb/file/speech_16k_out.wav"; write_wav(wavOut, allFramesOut, 16000, 1); return 0; }