I just run the commands using the docker cuda version. It started, but when trying to translate text using the example in swagger it throws this error. The language detection works fine
docker run --rm --gpus all \
-e SERVER_PORT=7860 \
-p 7860:7860 \
nllb-api
$ docker run --rm --gpus all -e SERVER_PORT=7860 -p 7860:7860 nllb-api
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[INFO] Starting granian (main PID: 1)
[INFO] Listening at: http://0.0.0.0:7860
[INFO] Spawning worker-1 with pid: 23
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 41165.47it/s]
/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 25540.42it/s]
[INFO] Started worker-1
[INFO] Started worker-1 runtime-1
[2024-09-13 18:22:32 +0000] 200 "GET /schema/swagger HTTP/1.1" 10.0.3.1 in 7.29 ms
Application Exception
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/litestar/middleware/_internal/exceptions/middleware.py", line 159, in __call__
await self.app(scope, receive, capture_response_started)
File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 80, in handle
response = await self._get_response_for_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 132, in _get_response_for_request
return await self._call_handler_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 152, in _call_handler_function
response_data, cleanup_group = await self._get_response_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 205, in _get_response_data
data = await route_handler.fn(**parsed_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/server/api/v3/translate.py", line 57, in translate_get
return Translated(result=await TranslatorPool.translate(text, source, target))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/server/features/translator.py", line 130, in translate
return await wrap_future(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/app/server/features/translator.py", line 69, in translate
results = self.translator.translate_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Flash attention 2 is not supported
[2024-09-13 18:22:55 +0000] 500 "GET /v3/translate HTTP/1.1" 10.0.3.1 in 159.08 ms
$ curl 'http://127.0.0.1:7860/api/v3/translate?text=Hello&source=eng_Latn&target=spa_Latn'
{"detail":"Internal Server Error"}
I just run the commands using the docker cuda version. It started, but when trying to translate text using the example in swagger it throws this error. The language detection works fine