UnprocessableEntityError when embedding using openai and a proxy

jupyterlab / jupyter-ai

A generative AI extension for JupyterLab

https://jupyter-ai.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

3.1k stars 306 forks source link

UnprocessableEntityError when embedding using openai and a proxy #464

Open MoezGholami opened 10 months ago

MoezGholami commented 10 months ago

Description

Original issue on OpenAI community forum: https://community.openai.com/t/attributeerror-module-openai-has-no-attribute-error/486676

The new version of the openai package is breaking the the Jupyter-ai notebook. Any command to the openai models raises the following error:

AttributeError: module 'openai' has no attribute 'error'

The Jupyter notebook works with openai version 0.28 but fails with openai 1.2.3.

Reproduce

pip install --upgrade openai
Send the any command to Open AI's API in a Juypter notebook.

Expected behavior

Jupyter AI should work with the new openai python package.

Context

Versions:

jupyter_ai: 2.5.0
openai: 1.2.3
python: 3.11.6
pip: 23.3.1

welcome[bot] commented 10 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

dlqqq commented 9 months ago

Quick dig through the LangChain PRs indicates that this was fixed last week, first released at v0.0.336. Should be fixable by bumping our LangChain version.

Edit: forgot to include link. https://github.com/langchain-ai/langchain/pull/13262

sqlreport commented 8 months ago

@dlqqq I have: python=3.10 jupyter_ai==1.9.0 langchain==0.0.350 openai==1.7.1

when I run /learn command on chatUI, I get an error IndexError: list index out of range

sqlreport commented 8 months ago

When I change my documents to txt file I am getting error like (looks like expecting string input instead)

Exception: UnprocessableEntityError("Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should be a valid string', 'input': [[2465, 836, 374, 350, 6043, 12336]], 'url': 'https://errors.pydantic.dev/2.4/v/string_type'}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a valid string', 'input': [2465, 836, 374, 350, 6043, 12336], 'url': 'https://errors.pydantic.dev/2.4/v/string_type'}]}")

JasonWeill commented 8 months ago

@sqlreport This is likely a regression after PR #551 was merged in and released in version 1.9.0 and 2.9.0. I can look at it.

JasonWeill commented 8 months ago

With tip-of-main in Jupyter AI and openai 1.6.1, I don't see this error with an OpenAI embedding model; /learn on a directory including text files works without errors.

sqlreport commented 8 months ago

@JasonWeill can you provide unit test code that can mimic the openai call so I can troubleshoot my openai proxy? I have ran openai example code with string and it works without issue. It must be the way chatUI generates the openai call where it is passing non-string data.

JasonWeill commented 7 months ago

@sqlreport I looked at our existing codebase and I don't see unit tests for the /learn handler, sorry. There are two possible places where unexpected non-string data might come in:

The split method in packages/jupyter-ai/jupyter_ai/document_loaders/directory.py: https://github.com/jupyterlab/jupyter-ai/blob/814eb44698a8e9a20737d671b41b5bc3f0594909/packages/jupyter-ai/jupyter_ai/document_loaders/directory.py#L51-L71 — this takes a file and converts it to chunks, according to one of several splitter classes.
The get_embeddings method in the same file: https://github.com/jupyterlab/jupyter-ai/blob/814eb44698a8e9a20737d671b41b5bc3f0594909/packages/jupyter-ai/jupyter_ai/document_loaders/directory.py#L95-L104 — this creates a list of Dask delayed tasks to send one chunk at a time to the selected embedding model class (em_provider_cls). Dask executes these in learn.py, the /learn handler: https://github.com/jupyterlab/jupyter-ai/blob/814eb44698a8e9a20737d671b41b5bc3f0594909/packages/jupyter-ai/jupyter_ai/chat_handlers/learn.py#L161

andrewbovey commented 3 months ago

@sqlreport Did you find a solution to the Exception: UnprocessableEntityError("Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should be a valid string', 'input': [[2465, 836, 374, 350, 6043, 12336]], 'url': 'https://errors.pydantic.dev/2.4/v/string_type'}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a valid string', 'input': [2465, 836, 374, 350, 6043, 12336], 'url': 'https://errors.pydantic.dev/2.4/v/string_type'}]}")

Mine is UnprocessableEntityError: Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should be a valid string', 'input': [[3923, 14071, 3956, 527, 1070, 30]]}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a valid string', 'input': [3923, 14071, 3956, 527, 1070, 30]}]}

langchain==0.2.1 opeanai==1.31.0