Closed lianhao closed 2 months ago
comment to take issue
another potential related issue opea-project/GenAIExamples#568
Hi @lianhao I got a similar "permission denied" error when trying to reproduce your error. I downloaded "test.docx" and tried to upload it via the curl command
curl -v -X POST -H "Content-Type: multipart/form-data" -F "files=@./test.docx" http://localhost:6007/v1/dataprep
But got internal server error
Attached the printouts from docker logs dataprep-redis-server
files:UploadFile(filename='test.docx', size=77397, headers=Headers({'content-disposition': 'form-data; name="files"; filename="test.docx"', 'content-type': 'application/octet-stream'}))
link_list:None
Parsing document ./uploaded_files/test.docx.
INFO: 172.17.0.1:52896 - "POST /v1/dataprep HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 398, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
await self.app(scope, receive, send_wrapper)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 486, in async_wrapper
raise e
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 472, in async_wrapper
function_result = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 200, in ingest_documents
ingest_data_to_redis(
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 167, in ingest_data_to_redis
content = document_loader(path)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/utils.py", line 337, in document_loader
return load_docx(doc_path)
^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/utils.py", line 192, in load_docx
docx2txt.process(docx_path, save_path)
File "/home/user/.local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 103, in process
with open(dst_fname, "wb") as dst_f:
^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: './imgs/image1.png'
My suspicion is that the current dataprep
version is still not supporting docx
files that contain png
images. I tried uploading gaudi3_whitepaper but encountered a different issue:
Using CPU. Note: This module is much faster with a GPU.
files:UploadFile(filename='gaudi-3-ai-accelerator-white-paper.pdf', size=2390860, headers=Headers({'content-disposition': 'form-data; name="files"; filename="gaudi-3-ai-accelerator-white-paper.pdf"', 'content-type': 'application/pdf'}))
link_list:None
Parsing document ./uploaded_files/gaudi-3-ai-accelerator-white-paper.pdf.
Done preprocessing. Created 52 chunks of the original pdf
[ ingest chunks ] file name: gaudi-3-ai-accelerator-white-paper.pdf
[ ingest chunks ] Current batch: 0
INFO: 172.17.0.1:42152 - "POST /v1/dataprep HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/user/.local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://172.25.116.82:6006/
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 398, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 174, in __call__
raise exc
File "/home/user/.local/lib/python3.11/site-packages/prometheus_fastapi_instrumentator/middleware.py", line 172, in __call__
await self.app(scope, receive, send_wrapper)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/user/.local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/user/.local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 486, in async_wrapper
raise e
File "/home/user/.local/lib/python3.11/site-packages/langsmith/run_helpers.py", line 472, in async_wrapper
function_result = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 200, in ingest_documents
ingest_data_to_redis(
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 176, in ingest_data_to_redis
return ingest_chunks_to_redis(file_name, chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/comps/dataprep/redis/langchain/prepare_doc_redis.py", line 127, in ingest_chunks_to_redis
_, keys = Redis.from_texts_return_keys(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 423, in from_texts_return_keys
keys = instance.add_texts(texts, metadatas, keys=keys)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/vectorstores/redis/base.py", line 694, in add_texts
embeddings = embeddings or self._embeddings.embed_documents(list(texts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/langchain_community/embeddings/huggingface_hub.py", line 116, in embed_documents
responses = self.client.post(
^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 304, in post
hf_raise_for_status(response)
File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 367, in hf_raise_for_status
raise HfHubHTTPError(message, response=response) from e
huggingface_hub.utils._errors.HfHubHTTPError:
403 Forbidden: None.
Cannot access content at: http://172.25.116.82:6006/.
If you are trying to create or update content, make sure you have a token with the `write` role.
I would recommend trying the ChatQnA example first, where dataprep-redis-server
is one of the microservices run from compose.yaml
. There I was able to successfully run the above curl command for gaudi3_whitepaper
@ctao456 your issue could be resolved by passing in the HUGGINGFACEHUB_API_TOKEN environment variable to the container. We should resolve the docx file issue, opea-project/GenAIExamples#568 mentioned another issue with docx file which doesn't contain any picture in it.
Hi @lianhao thank you. I already tried passing in -e HUGGINGFACEHUB_API_TOKEN=$(HUGGINGFACEHUB_API_TOKEN)
when I docker run
the instance, and that still resulted in the above printout. I verified that TEI is running and my hf api token has write access to baai/bge-base-en-v1.5
. So not sure of the reason.
However, with same configs, the docker instance run from GenAIExamples/ChatQnA
works.
Good to hear about a potential solution to uploading docx
files.
@ctao456 as for the .img permission denied issue, I guess it related to the function https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep/utils.py#L191 where it tries to create a temporary directory where it doesn't have the write permission. I would suggest to create the temporary directory using Python's tempfile module, instead of writing your own mktemdir/delete logic
@ctao456 as for the .img permission denied issue, I guess it related to the function https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep/utils.py#L191 where it tries to create a temporary directory where it doesn't have the write permission. I would suggest to create the temporary directory using Python's tempfile module, instead of writing your own mktemdir/delete logic
Understood. Please feel free to commit a pr. Thanks.
@ctao456 as for the .img permission denied issue, I guess it related to the function https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep/utils.py#L191 where it tries to create a temporary directory where it doesn't have the write permission. I would suggest to create the temporary directory using Python's tempfile module, instead of writing your own mktemdir/delete logic
Understood. Please feel free to commit a pr. Thanks.
Unfortunately, I don't have bandwidth to resolve this right now.
Please assign this bug to me. I have pending PRs to be submitted.
Completed as PR #561 is merged
When I try to upload a docx file with embedded images test.docx to the datapre-redis service (built and launch from here ),
I found the following error in curl:
Checking the dataprep-redis service logs found the following errors: