Open lvxinliang opened 3 weeks ago
@lvxinliang Hello there! I'm here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's work together to resolve any issues you encounter.
The issue with Tencent Cloud ASR failing to recognize any content on an Android phone when using version 0.6.16 of the Dify app could be due to the first_channel_only
parameter being set to 1
by default in the FlashRecognitionRequest
class. This setting means that only the first audio channel is being processed, which might not contain any data if the audio is recorded in a different channel.
To resolve this issue, you can try setting first_channel_only
to 0
to ensure that all audio channels are processed:
class FlashRecognitionRequest:
def __init__(self, voice_format="mp3", engine_type="16k_zh"):
self.engine_type = engine_type
self.speaker_diarization = 0
self.hotword_id = ""
self.customization_id = ""
self.filter_dirty = 0
self.filter_modal = 0
self.filter_punc = 0
self.convert_num_mode = 1
self.word_info = 0
self.voice_format = voice_format
self.first_channel_only = 1 # This might be causing the issue
self.reinforce_hotword = 0
self.sentence_max_length = 0
flash_recognition_request = FlashRecognitionRequest()
flash_recognition_request.set_first_channel_only(0)
This change should help in processing the audio content correctly, regardless of which channel the audio is recorded in [1].
To continue talking to Dosu, mention @dosu.
I changed it to 0 as you said, and confirmed that it took effect, but it still doesn't work properly under Android. The iPhone is ok, but I suspect the audio is not being captured.
self.voice_format = voice_format
self.first_channel_only = 0
self.reinforce_hotword = 0
self.sentence_max_length = 0
logger.info("engine_type: %s, voice_format: %s, speaker_diarization: %s, hotword_id: %s, "
"customization_id: %s, filter_dirty: %s, filter_modal: %s, filter_punc: %s, convert_num_mode: %s, "
"word_info: %s, first_channel_only: %s, reinforce_hotword: %s, sentence_max_length: %s",
self.engine_type, self.voice_format, self.speaker_diarization, self.hotword_id,
self.customization_id, self.filter_dirty, self.filter_modal, self.filter_punc,
self.convert_num_mode, self.word_info, self.first_channel_only, self.reinforce_hotword,
self.sentence_max_length)
log:
2024-08-20 15:30:31,981.981 INFO [Dummy-2] [flash_recognizer.py:32] - engine_type: 16k_zh, voice_format: mp3, speaker_diarization: 0, hotword_id: , customization_id: , filter_dirty: 0, filter_modal: 0, filter_punc: 0, convert_num_mode: 1, word_info: 0, first_channel_only: 0, reinforce_hotword: 0, sentence_max_length: 0
2024-08-20 15:30:53,897.897 INFO [Dummy-3] [flash_recognizer.py:32] - engine_type: 16k_zh, voice_format: mp3, speaker_diarization: 0, hotword_id: , customization_id: , filter_dirty: 0, filter_modal: 0, filter_punc: 0, convert_num_mode: 1, word_info: 0, first_channel_only: 0, reinforce_hotword: 0, sentence_max_length: 0
2024-08-20 15:31:07,469.469 INFO [Dummy-4] [flash_recognizer.py:32] - engine_type: 16k_zh, voice_format: mp3, speaker_diarization: 0, hotword_id: , customization_id: , filter_dirty: 0, filter_modal: 0, filter_punc: 0, convert_num_mode: 1, word_info: 0, first_channel_only: 0, reinforce_hotword: 0, sentence_max_length: 0
To continue talking to Dosu, mention @dosu.
Recording through Android phone, added dump MP3, found that sometimes the audio length of MP3 saved here is 0
# file: audio_service.py
file_content = file.read()
file_size = len(file_content)
# dump mp3 file
dump_file_name = f"/tmp/{int(time.time())}.mp3"
with open(dump_file_name, 'wb') as f:
f.write(file_content)
logger.info("Audio file saved to %s, file size: %s", dump_file_name, file_size)
if file_size > FILE_SIZE_LIMIT:
message = f"Audio size larger than {FILE_SIZE} mb"
raise AudioTooLargeServiceError(message)
model_manager = ModelManager()
model_instance = model_manager.get_default_model_instance(
tenant_id=app_model.tenant_id,
model_type=ModelType.SPEECH2TEXT
)
Self Checks
Dify version
0.6.16
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
✔️ Expected Behavior
ASR is executed successfully and returns a string
❌ Actual Behavior
ASR processing result is empty