neuml / txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
https://neuml.github.io/txtai
Apache License 2.0
8.65k stars 573 forks source link

Add missing pipelines to API #552

Open semack opened 12 months ago

semack commented 12 months ago

Hi guys,

First of all, thank you for the amazing job you do.

I didn't find API for Text-To-Speech. The workflow can be used for this I think, but are there any plans to implement it on API?

Kind regards, /Andriy

davidmezzetti commented 12 months ago

Thank you for the issue.

The plan moving forward was to push running pipelines through workflows instead of direct when using the API.

davidmezzetti commented 11 months ago

Upon further review, there are only a few that aren't in the API and it makes sense to have the routers. I've been pushing things more to workflows but it doesn't hurt to have pipelines, especially in the case of a LLM pipeline.

semack commented 11 months ago

Another thing I've faced - in my setup txtxai is hosted in a separate remote environment with a powerful GPU and my custom software needs it to be used remotely using the API. Some pipelines like Textraction and Transcription need to have a file name as an argument. The Textraction from remote sources works well, but Transcription doesn't. Could it be fixed?

davidmezzetti commented 11 months ago

The pipelines are focused on a single task by design. That's where workflows come in. There are workflow steps for reading from URLs and cloud object storage.

semack commented 11 months ago

Hi David,

Thank you for pointing me out, the retrieve task helped me, transcription works well. I am now having another problem with workflow while I'm trying to make tts get to work in a docker container.

docker-compose file

version: '3.4' services: txtai-api: build: context: . dockerfile: txtai-api.Dockerfile ports: - 8000:8000 volumes: - ./app.yml:/app/app.yaml:ro - ./.cache:/models environment: - CONFIG=/app/app.yaml - TRANSFORMERS_CACHE=/models #command: python -c "import tensorflow as tf;tf.test.gpu_device_name()" deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: [gpu]

txtai-api.Dockerfile

# Set base image ARG BASE_IMAGE=neuml/txtai-gpu:latest FROM $BASE_IMAGE # Start server and listen on all interfaces ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]

app.yml

# Index file path path: /tmp/index # Allow indexing of documents writable: True # Enbeddings index embeddings: path: sentence-transformers/nli-mpnet-base-v2 # Extractive QA extractor: path: distilbert-base-cased-distilled-squad # Zero-shot labeling labels: # Similarity similarity: # Text segmentation segmentation: sentences: true # Text summarization summary: # Text extraction textractor: join: true lines: false minlength: 100 paragraphs: true sentences: false # Transcribe audio to text transcription: #Text To Speech texttospeech: # Translate text between languages translation: # Workflow definitions workflow: sumfrench: tasks: - action: textractor task: url - action: summary - action: translation args: ["fr"] sumspanish: tasks: - action: textractor task: url - action: summary - action: translation args: ["es"] tts: tasks: - action: texttospeech stt: tasks: - task: retrieve - action: transcription

There is my call in C#, sorry not Python, but I showed it for understanding context.

        public async Task<TextToSpeechResponse> Handle(TextToSpeechCommand request, CancellationToken cancellationToken)
        {
            var wf = new Workflow(_settings.BaseUrl);

            var elements = new List<string>()
            {
                { request.Text }
            };

            var data = await wf.WorkflowActionAsync("tts", elements);

            var result = new TextToSpeechResponse
            {
                Binary = (byte[])data.FirstOrDefault()
            };

            return result;
        }
    }
Logs from the container

root@debian-AI:/opt/docker/txtai# docker compose up [+] Running 2/1 ✔ Network txtai_default Created 0.1s ✔ Container txtai-txtai-api-1 Created 0.0s Attaching to txtai-txtai-api-1 txtai-txtai-api-1 | [nltk_data] Downloading package averaged_perceptron_tagger to txtai-txtai-api-1 | [nltk_data] /root/nltk_data... txtai-txtai-api-1 | [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip. txtai-txtai-api-1 | [nltk_data] Downloading package cmudict to /root/nltk_data... txtai-txtai-api-1 | [nltk_data] Unzipping corpora/cmudict.zip. txtai-txtai-api-1 | INFO: Started server process [1] txtai-txtai-api-1 | INFO: Waiting for application startup. txtai-txtai-api-1 | No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. txtai-txtai-api-1 | No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. Downloading (…)lve/main/config.yaml: 100%|██████████| 1.10k/1.10k [00:00<00:00, 540kB/s] Downloading model.onnx: 100%|██████████| 133M/133M [00:02<00:00, 48.3MB/s] txtai-txtai-api-1 | No model was supplied, defaulted to facebook/wav2vec2-base-960h and revision 55bb623 (https://huggingface.co/facebook/wav2vec2-base-960h). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. txtai-txtai-api-1 | Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] txtai-txtai-api-1 | You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. txtai-txtai-api-1 | INFO: Application startup complete. txtai-txtai-api-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) txtai-txtai-api-1 | INFO: 10.20.255.4:54510 - "POST /workflow HTTP/1.1" 500 Internal Server Error txtai-txtai-api-1 | ERROR: Exception in ASGI application txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 230, in jsonable_encoder txtai-txtai-api-1 | data = dict(obj) txtai-txtai-api-1 | TypeError: cannot convert dictionary update sequence element #0 to a sequence txtai-txtai-api-1 | txtai-txtai-api-1 | During handling of the above exception, another exception occurred: txtai-txtai-api-1 | txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 235, in jsonable_encoder txtai-txtai-api-1 | data = vars(obj) txtai-txtai-api-1 | TypeError: vars() argument must have __dict__ attribute txtai-txtai-api-1 | txtai-txtai-api-1 | The above exception was the direct cause of the following exception: txtai-txtai-api-1 | txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi txtai-txtai-api-1 | result = await app( # type: ignore[func-returns-value] txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ txtai-txtai-api-1 | return await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 292, in __call__ txtai-txtai-api-1 | await super().__call__(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 122, in __call__ txtai-txtai-api-1 | await self.middleware_stack(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 184, in __call__ txtai-txtai-api-1 | raise exc txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 162, in __call__ txtai-txtai-api-1 | await self.app(scope, receive, _send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__ txtai-txtai-api-1 | raise exc txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__ txtai-txtai-api-1 | await self.app(scope, receive, sender) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__ txtai-txtai-api-1 | raise e txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__ txtai-txtai-api-1 | await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 718, in __call__ txtai-txtai-api-1 | await route.handle(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 276, in handle txtai-txtai-api-1 | await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 66, in app txtai-txtai-api-1 | response = await func(request) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 291, in app txtai-txtai-api-1 | content = await serialize_response( txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 179, in serialize_response txtai-txtai-api-1 | return jsonable_encoder(response_content) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 209, in jsonable_encoder txtai-txtai-api-1 | jsonable_encoder( txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 238, in jsonable_encoder txtai-txtai-api-1 | raise ValueError(errors) from e txtai-txtai-api-1 | ValueError: [TypeError('cannot convert dictionary update sequence element #0 to a sequence'), TypeError('vars() argument must have __dict__ attribute')]

Could you help me figure out the problem please? I feel that there is something missing.

Thank you in advance, Andriy

semack commented 11 months ago

When I'm using curl there is the same.

curl   -X POST "http://localhost:8000/workflow"   -H "Content-Type: application/json"   -d '{"name":"tts", "elements":["Say something here"]}'

I figured out that the problem on filling responses on the server when using tts.

davidmezzetti commented 11 months ago

I'll have to look at this closer but it seems like it might be an issue with returning binary data as JSON.

semack commented 11 months ago

Yes, I have the same suspicion.

davidmezzetti commented 11 months ago

Well instead of binary, I should say NumPy arrays which are what are returned.

You can add your own custom pipeline that converts the waveforms to Python floats which are JSON serializable.

class Converter:
    def __call__(self, inputs):
        return [x.tolist() for x in inputs]

Or perhaps something that even writes it to a WAV file then base64 encodes that data like what's in this notebook - https://github.com/neuml/txtai/blob/master/examples/40_Text_to_Speech_Generation.ipynb

Ultimately, I think having options to write to WAV/base64 encode could be good options to add to the TTS pipeline.

semack commented 11 months ago

Ultimately, I think having options to write to WAV/base64 encode could be good options to add to the TTS pipeline.

This could be the best solution IMHO. Also, it could be a Task I guess. Thanks.