filter the markdown image block for TTS input

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

https://dify.ai

Other

53.49k stars 7.82k forks source link

filter the markdown image block for TTS input #9133

Open verigle opened 1 month ago

verigle commented 1 month ago

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

for tts, expect remove TTS input text of markdown image url .

2. Additional context or comments

llm output example

this is a new picture:
![picture1](http://www.test.com/test.jpg)

it is not need to use TTS to audio for image markdown block (![picture1](http://www.test.com/test.jpg)), is there has any way to remove image url for TTS input

3. Can you help us with this feature?

[ ] I am interested in contributing to this feature.

crazywoola commented 1 month ago

Not sure what you are trying to say, what do you expect?

verigle commented 1 month ago

I expect the tts don't read the markdown block of image url.

example for llm output:

this is a new picture:
![picture1](http://www.test.com/test.jpg)

for tts, it only required to read the word of this is a new picture, however, the current tts will also read the url of [picture1](http://www.test.com/test.jpg), which is not useful audio for user

verigle commented 1 month ago

is there any plan to filter the markdown image block for TTS input?

dosubot[bot] commented 1 week ago

Hi, @verigle. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary

Feature request to filter out markdown image blocks from TTS input.
You provided an example to clarify that TTS should read only descriptive text, not image URLs.
@crazywoola initially expressed confusion, which you addressed with further clarification.
No current plans or contributions towards implementing this feature.

Next Steps

Is this issue still relevant to the latest version of the Dify repository? If so, please comment to keep the discussion open.
If no further activity occurs, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!