k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.11k stars 359 forks source link

Running TTS Engine On Node.js Without Saving Files On Server, Auto Delete Temp File Once IT Generated #1263

Open studionexus-lk opened 3 weeks ago

studionexus-lk commented 3 weeks ago

TTS Service Implementation Guide

  1. File Structure The following is the recommended file structure for the TTS service:
tts-service/
├── server.js
├── package.json
├── public/
│   └── index.html
└── auto_en_us_904/
    ├── auto_en_us_904.onnx
    ├── tokens.txt
    └── espeak-ng-data/
  1. Code Creation 2.1. Create the package.json File Run the following command to initialize a new Node.js project:

npm init -y

Install the required packages:

npm install express body-parser sherpa-onnx-node

2.2. Create the server.js File

// server.js
const express = require('express');
const bodyParser = require('body-parser');
const { OfflineTts, writeWave } = require('sherpa-onnx-node');
const fs = require('fs');
const path = require('path');

const app = express();
const port = 3000;

app.use(bodyParser.json());
app.use(express.static(path.join(__dirname, 'public')));

const ttsConfig = {
  model: {
    vits: {
      model: './auto_en_us_904/auto_en_us_904.onnx',
      tokens: './auto_en_us_904/tokens.txt',
      dataDir: './auto_en_us_904/espeak-ng-data',
    },
    debug: true,
    numThreads: 1,
    provider: 'cpu',
  },
  maxNumStences: 1,
};

const tts = new OfflineTts(ttsConfig);

app.post('/generate', (req, res) => {
  const text = req.body.text;
  if (typeof text !== 'string') {
    return res.status(400).send('Invalid input');
  }

  try {
    const audio = tts.generate({ text, sid: 0, speed: 1.0 });
    const filename = 'temp-output.wav';

    // Save the audio to a temp file
    writeWave(filename, { samples: audio.samples, sampleRate: audio.sampleRate });

    // Stream the file back to the client
    res.setHeader('Content-Type', 'audio/wav');
    const readStream = fs.createReadStream(filename);
    readStream.pipe(res);

    // Delete the temp file after streaming
    readStream.on('end', () => {
      fs.unlink(filename, err => {
        if (err) console.error('Failed to delete temp file:', err);
      });
    });

  } catch (error) {
    console.error('Error processing TTS:', error);
    res.status(500).send('Error processing TTS');
  }
});

app.listen(port, () => {
  console.log(`TTS service listening at http://localhost:${port}`);
});

2.3. Create the index.html File Place this file in the public/ directory:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>TTS Service</title>
</head>
<body>
  <h1>Text-to-Speech Service</h1>
  <textarea id="textInput" rows="5" cols="50" placeholder="Enter text here"></textarea><br>
  <button onclick="generateSpeech()">Generate Speech</button>
  <br><br>
  <audio id="audioPlayer" controls></audio>

  <script>
    async function generateSpeech() {
      const textInput = document.getElementById('textInput').value;
      if (!textInput.trim()) {
        alert('Please enter some text');
        return;
      }

      try {
        const response = await fetch('http://localhost:3000/generate', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({ text: textInput }),
        });

        if (!response.ok) {
          throw new Error('HTTP error! Status: ' + response.status);
        }

        const blob = await response.blob();
        const audioUrl = URL.createObjectURL(blob);
        const audioPlayer = document.getElementById('audioPlayer');
        audioPlayer.src = audioUrl;
        audioPlayer.play();
      } catch (error) {
        console.error('Error generating speech:', error);
        alert('Error generating speech: ' + error.message);
      }
    }
  </script>
</body>
</html>
  1. Running the Service Navigate to the Project Directory: Open a terminal or command prompt and navigate to the directory containing your project.

cd path/to/tts-service

Start the Server: Run the following command to start the server.

node server.js

Access the Service: Open a web browser and navigate to http://localhost:3000. You should see the Text-to-Speech Service interface where you can enter text, click the button, and listen to the generated speech.

csukuangfj commented 3 weeks ago

@studionexus-lk Thanks for sharing!

Would you like to update our doc to include it?

The source code for the doc is located at https://github.com/k2-fsa/sherpa/tree/master/docs/source/onnx

You can find how it is rendered at https://k2-fsa.github.io/sherpa/onnx/index.html

studionexus-lk commented 3 weeks ago

I wish I could, but i don't how to use GitHub and lack of experience Using GitHub.. can you update it or let me know how, then I'll do it by my side

csukuangfj commented 3 weeks ago

but i don't how to use GitHub and lack of experience Using GitHub

please read https://github.com/firstcontributions/first-contributions for how to contribute to a repo.

you can also search with google for how to contribute


As for adding to the doc, we are using sphinx RST to write the doc, which is similar to markdown.

Please refer to https://www.sphinx-doc.org/en/master/tutorial/index.html for how to using sphinx RST.

You can also have a look at our existing doc source code.

If you have any issues, please feel free to ask.