langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
42.99k stars 5.97k forks source link

create_blob_message only return a link ,not blob messages #6325

Open Leekinxun opened 1 month ago

Leekinxun commented 1 month ago

Self Checks

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

I hope to return an audio clip within the built-in tool, but only a link was returned like" Link: /files/tools/b4dd3442-0f99-4072-9a65-40bd32e8bbdb.mp3". I can get audio by audio_play. The specific code is as follows: def call_tts_stream(text: str) -> Iterator[bytes]: tts_url = url tts_headers = build_tts_stream_headers() tts_body = build_tts_stream_body(text)

        response = requests.request("POST", tts_url, stream=True, headers=tts_headers, data=tts_body)
        for chunk in (response.raw):
            if chunk:
                if chunk[:5] == b'data:':
                    data = json.loads(chunk[5:])
                    if "data" in data and "extra_info" not in data:
                        if "audio" in data["data"]:
                            audio = data["data"]['audio']
                            yield audio

    def audio_play(audio_stream: Iterator[bytes]) -> bytes:
        audio = b""
        for chunk in audio_stream:
            if chunk is not None and chunk != '\n':
                decoded_hex = bytes.fromhex(chunk)
                mpv_process.stdin.write(decoded_hex)  # type: ignore
                mpv_process.stdin.flush()
                audio += decoded_hex

        return audio

    audio_chunk_iterator = call_tts_stream(query)
    audio = audio_play(audio_chunk_iterator)
    return self.create_blob_message(blob=audio,
                                    meta={'mime_type': 'audio/mpeg'}
                                    )

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

@Leekinxun Hello there! I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's work together to solve this issue.

To ensure that the create_blob_message method returns the actual audio blob instead of just a link, you need to verify that the ToolInvokeMessage is correctly configured to handle the blob data. The code you provided seems correct, but let's ensure everything is set up properly.

Here's a refined version of your code:

from core.tools.entities.tool_entities import ToolInvokeMessage

def call_tts_stream(text: str) -> Iterator[bytes]:
    tts_url = url
    tts_headers = build_tts_stream_headers()
    tts_body = build_tts_stream_body(text)

    response = requests.request("POST", tts_url, stream=True, headers=tts_headers, data=tts_body)
    for chunk in response.raw:
        if chunk:
            if chunk[:5] == b'data:':
                data = json.loads(chunk[5:])
                if "data" in data and "extra_info" not in data:
                    if "audio" in data["data"]:
                        audio = data["data"]['audio']
                        yield audio

def audio_play(audio_stream: Iterator[bytes]) -> bytes:
    audio = b""
    for chunk in audio_stream:
        if chunk is not None and chunk != '\n':
            decoded_hex = bytes.fromhex(chunk)
            mpv_process.stdin.write(decoded_hex)  # type: ignore
            mpv_process.stdin.flush()
            audio += decoded_hex

    return audio

audio_chunk_iterator = call_tts_stream(query)
audio = audio_play(audio_chunk_iterator)
return self.create_blob_message(blob=audio, meta={'mime_type': 'audio/mpeg'})

Ensure that the create_blob_message method in the Tool class is implemented as follows:

def create_blob_message(self, blob: bytes, meta: dict = None, save_as: str = '') -> ToolInvokeMessage:
    """
    Create a blob message.

    :param blob: The blob data.
    :param meta: Metadata associated with the blob.
    :param save_as: A string indicating how the blob should be saved.
    :return: The blob message.
    """
    return ToolInvokeMessage(
        type=ToolInvokeMessage.MessageType.BLOB,
        message=blob,
        meta=meta,
        save_as=save_as
    )

This method should correctly return the blob message containing the audio data. If you are still seeing a link instead of the actual blob, ensure that the ToolInvokeMessage class and its handling in your application are correctly set up to process and return blob data [1][2].

To continue talking to Dosu, mention @dosu.

Leekinxun commented 1 month ago

@dosu Thanks for your response.And I have reviewed the two sections you mentioned and confirmed that the backend input content is as follows: Outputs: [ToolInvokeMessage(type=<MessageType.BLOB: 'blob'>, message=b'ID3\x04\x00\x00\x00\x00\x00#TSSE\x00\x00\x00\x0f\x00\x00\x03Lavf58.29.100\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xfb\x98\xc4\x00\x00\xdcz\x1e\xfe\x07\xb0\xd7\xc3\xb0\xbd\xa4\xd0\xf3=y\x0f@\xdc\x88A\x12)x\x90\x0eB\x16\x85\xb7\xbb%\x08\x08\xf4\x91\x9f)\xf5^\xd8\xd9\xd4\xe7Z\xb1Y\xacJ\x87\xb7!\x0cj\xc8\xb3\xa4\xc8Y\x08\x02(\x1c\x020\x10j\xae\x11-\xf4\x84\xc1,\x1b\x93\xc4\xb3\xf6\xcf\xca\x88\x8c\x1d\x8db\xc5\x8c\x9f\xb0%\xab\xe0H\x8b!~\xcb\x16\x1c\x1c\x183fD\x83\x01!(7\x00q|G\xed\x87c0h\x8a\x10Lw\xcf\xbd\xd2\x13\x1cC\x12\xdf*\x13\x19yd+\xd7\xbfs\x84\xa2[Q>\x90\x98\x7f\x1d7\xe9\xb6q\x85\x8e^[~\xf7\x90L\x8a\xff\x8b\x0f\x1f\x9b\xdf\xd800\xa9\x85`<l\xfdy\x9a\xe3x\x89fb[\xe7vd\x1a(\xca\x1c\x18\x18E\x82\x19o7\xdb_g>\xf0\x18E\x01\xc2\xc72\x9dv\xc9g\xf9\x87\x93\xbed\x104-\xf4\xb2u\x18\xfa\xd1\x1e\xf7/t\xc8\xbd\x8c\xfd\xa1\xfb\xdf1\r\xa8\x16\xc8\x10\x00\x17\x88\x19\x14\xed\x8a\xcaC\x9d\xfch\x04\x17\x82\x0b\x01\x90 \x87{)\x08\x However, the return information is still a link : Link: /files/tools/b4dd3442-0f99-4072-9a65-40bd32e8bbdb.mp3

dosubot[bot] commented 1 month ago

To ensure that the create_blob_message method returns the actual audio blob instead of just a link, you need to pass the raw data of the audio file as a bytes object to the blob parameter. Here is an example of how you can do this:

def create_audio_blob_message(audio_data: bytes, mime_type: str = 'audio/mpeg') -> ToolInvokeMessage:
    """
    Create an audio blob message.

    :param audio_data: The raw data of the audio file.
    :param mime_type: The MIME type of the audio file.
    :return: The blob message.
    """
    meta = {'mime_type': mime_type}
    return self.create_blob_message(blob=audio_data, meta=meta)

In this example, audio_data should be the raw bytes of the audio file, and mime_type should be the appropriate MIME type for the audio file (e.g., 'audio/mpeg' for MP3 files). This will ensure that the method returns the actual audio blob.

According to the Dify documentation, the create_blob_message method is designed to handle raw data of various file types, including audio. The blob parameter should be the raw data of the file, and the meta parameter should include metadata such as the MIME type. This approach ensures that the method returns the actual audio blob.

def create_blob_message(self, blob: bytes, meta: dict = None, save_as: str = '') -> ToolInvokeMessage:
    """
    Create a blob message.

    :param blob: The blob.
    :param meta: The metadata of the blob.
    :param save_as: The name to save the blob as.
    :return: The blob message.
    """

Additionally, you can refer to the AudioBtn component in the Dify codebase, which demonstrates how to handle audio blobs. The loadAudio function in the component converts the audio response data into a blob and creates an object URL for it:

const loadAudio = async () => {
    const formData = new FormData()
    formData.append('text', removeCodeBlocks(value))
    formData.append('voice', removeCodeBlocks(voice))

    if (value !== '') {
        setAudioState('loading')

        let url = ''
        let isPublic = false

        if (params.token) {
            url = '/text-to-audio'
            isPublic = true
        }
        else if (params.appId) {
            if (pathname.search('explore/installed') > -1)
                url = `/installed-apps/${params.appId}/text-to-audio`
            else
                url = `/apps/${params.appId}/text-to-audio`
        }
        try {
            const audioResponse = await textToAudio(url, isPublic, formData)
            const blob_bytes = Buffer.from(audioResponse.data, 'latin1')
            const blob = new Blob([blob_bytes], { type: 'audio/wav' })
            const audioUrl = URL.createObjectURL(blob)
            audioRef.current!.src = audioUrl
        }
        catch (error) {
            setAudioState('initial')
            console.error('Error playing audio:', error)
        }
    }
}

By following this approach, you can ensure that the create_blob_message method returns the actual audio blob instead of just a link [1][2][3][4].

To continue talking to Dosu, mention @dosu.