openai / chatgpt-retrieval-plugin

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
MIT License
21.07k stars 3.68k forks source link

upsert-file not working #152

Open hminooei opened 1 year ago

hminooei commented 1 year ago

Hi,

When I try to upsert an md file to the data-store (pinecore), it says the file type is not supported. image

Here is the log:

mimetype: application/octet-stream
file.file: <tempfile.SpooledTemporaryFile object at 0x1236cf0a0>
file:  <starlette.datastructures.UploadFile object at 0x1236cf6d0>
Error: Unsupported file type: application/octet-stream
Error: Unsupported file type: application/octet-stream
INFO:     127.0.0.1:65379 - "POST /upsert-file HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/fastapi/applications.py", line 271, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/applications.py", line 118, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/Users/hminooei/Library/Caches/pypoetry/virtualenvs/chatgpt-retrieval-plugin-Gkq6VzhC-py3.10/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/server/main.py", line 63, in upsert_file
    document = await get_document_from_file(file, metadata_obj)
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 17, in get_document_from_file
    extracted_text = await extract_text_from_form_file(file)
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 111, in extract_text_from_form_file
    raise e
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 107, in extract_text_from_form_file
    extracted_text = extract_text_from_filepath(temp_file_path, mimetype)
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 42, in extract_text_from_filepath
    raise e
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 39, in extract_text_from_filepath
    extracted_text = extract_text_from_file(file, mimetype)
  File "/Users/hminooei/code/chatgpt-retrieval-plugin/services/file.py", line 84, in extract_text_from_file
    raise ValueError("Unsupported file type: {}".format(mimetype))
ValueError: Unsupported file type: application/octet-stream

Running on MacOS 13.2.1 (22D68)

hminooei commented 1 year ago

Tried to change the file extensions to txt, although the python's mimetypes.guess_type says it's text/plain, when sending it through the endpoint, I see

Exception: Unsupported file type
mimetype: None 
hminooei commented 1 year ago

Created the PR #161 to cover the cases above.