hegedustibor / htgo-tts

Text to speech package for Golang.
MIT License
165 stars 44 forks source link

Text length #10

Open hajsf opened 2 years ago

hajsf commented 2 years ago

Hi. Is there a limit to the text of file length? I combining Arabic text to be spoken, but found my self limited and got error that the file is corrupted if added more words, is it something related to max allowed text length, or for the way I do combine the text?

My code is:

    speech := htgotts.Speech{Folder: "audio", Language: voices.Arabic, Handler: &handlers.MPlayer{}}

    var reply strings.Builder
    text := "أهلا و سهلا"
    //  reply.WriteString("عفوا لم أفهمك")
    reply.WriteString(text)
    reply.WriteString(data.Text)
    //  reply.WriteString("غير واضح") // Not working after the data.Text!

    f, err := speech.CreateSpeechFile(reply.String(), "test")
    if err != nil {
        fmt.Println("Sorry could not save file, ", err)
    } 

    speech.Speak(reply.String())

And the error I got is:

image
cybernamix commented 1 year ago

Found the same issue with English, seems there is some text length restriction after approx 200 characters where file is created but during playback using Windows mp3 player it shows the error:

This file isn't playable. That might be because the file type is unsupported, the file extension is incorrect, or the file is corrupt.

Tried various on Windows - VLC, GrooveMusic - players with same result

kendfss commented 1 year ago

Been looking up docs for this api, just found a bunch of threads of people advising each other not to use it, lol. It may be worth noting that:

...Google appears to be limiting the speech duration to 15 seconds...

That said, I've got a work around that splits the text into chunks, makes the requests, and combines them into a bytes.buffer. PR incoming, but here's a crude version:

func (speech Speech) fetch(text string) (io.Reader, error) {
    data := []byte(text)

    chunkSize := len(data)
    if len(data) > 32 {
        chunkSize = 32
    }

    urls := make([]string, 0)
    for prev, i := 0, 0; i < len(data); i++ {
        if i%chunkSize == 0 && i != 0 {
            chunk := string(data[prev:i])
            url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
            urls = append(urls, url)
            prev = i
        } else if i == len(data)-1 {
            chunk := string(data[prev:])
            url := fmt.Sprintf("http://translate.google.com/translate_tts?ie=UTF-8&total=1&idx=0&textlen=%d&client=tw-ob&q=%s&tl=%s", chunkSize, url.QueryEscape(chunk), speech.Language)
            urls = append(urls, url)
            prev = i
        }
    }

    buf := new(bytes.Buffer)
    for _, url := range urls {
        r, err := http.Get(url)
        if err != nil {
            return nil, err
        }

        _, err = buf.ReadFrom(r.Body)
        if err != nil {
            return nil, err
        }
        r.Body.Close()
    }
    return buf, nil
}

All that said there's also voicerss.org - also free for a few MBs per day. Perhaps we could use that instead and save the google stuff as a fallback for anybody who hasn't got a key or in case the voicerss price plan changes. Generally speaking though, I suggest we refactor to the following api:

type (
    Engine interface {
        Fetch(text string) io.Reader
        Save(text, path string)
        ShouldFetch(text string) (buf io.Reader, err error)
        ShouldSave(text, path string) error
        Language(setting string) 
        Voice(setting Language) // defer to Engine.Language for google
    }
    Player interface {
        PlayBuf(buf io.Reader)
        PlayFile(path string)
        ShouldPlayBuf(buf io.Reader)error
        ShouldPlayFile(path string)error
    }
)

func Google(language string) Engine
func VoiceRSS(language, key string) Engine
func Native(channels, bitDepth int) Player
func MPlayer() Player

thus allowing:

package main

import (
    "os"
    "strings"

    tts "github.com/hegedustibor/htgo-tts"
)

func main() {
    text := strings.Join(os.Args[1:], " ")

    e, p := tts.Google("en"), tts.Native(2, 2)
    p.PlayBuf(e.Fetch(text))
}

Or perhaps even just leave the players to external APIs. If we keep them, there should also be a CLI, imo.

drgrib commented 2 months ago

@hegedustibor I advise checking if the character length of the input is greater than 200 and returning an error if it is. Otherwise this just fails silently, at least on Mac.