BadRequestError in "Cluster" stage

michelle123lam / lloom

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-level concepts to analyze unstructured text.

https://stanfordhci.github.io/lloom

BSD 3-Clause "New" or "Revised" License

60 stars 14 forks source link

BadRequestError in "Cluster" stage #11

Closed andreawwenyi closed 4 months ago

andreawwenyi commented 4 months ago

Hi! Thanks for developing LLooM! I am eager to try it out on my dataset of 1000 documents, but I kept getting this error at the "Cluster" stage: BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}. It seems like it's related to the get_embedding function. I am using python==3.11, openai==1.33.0, default embedding model specified in LLooM. I would not get this error if I am only playing with 200 documents, but once I increase the document size I would get the error. Do you might know what's causing the issue and what could be a workaround? Thank you!

michelle123lam commented 4 months ago

Hi, thanks for trying out LLooM! Yes, this seems related to the same issue that prompted this prior pull request (and that came up in this external OpenAI community thread). Basically, there are issues with the OpenAI embeddings API when there are too many embeddings requested at once, which is probably why it only came up when you tested with the much larger dataset. These issues should be fixed when I update the production package to reflect the current state of the repo (which has the PR merged in). I'll go ahead and do that and follow up in the thread when that's been updated!

michelle123lam commented 4 months ago

Hi @andreawwenyi , as a follow-up, I've packaged the most recent commits into v0.7.4, which you can install by upgrading the text_lloom Python package!

andreawwenyi commented 4 months ago

Thank you so much for the fast turnaround! It's working great for me now.

On Wed, Jul 10, 2024 at 11:36 PM Michelle Lam @.***> wrote:

Closed #11 https://github.com/michelle123lam/lloom/issues/11 as completed.

— Reply to this email directly, view it on GitHub https://github.com/michelle123lam/lloom/issues/11#event-13465440315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWPRTSWZDCMLDTCIQV5WVLZLX4RZAVCNFSM6AAAAABKVSAGXWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGQ3DKNBUGAZTCNI . You are receiving this because you were mentioned.Message ID: @.***>