neuml / txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
https://neuml.github.io/txtai
Apache License 2.0
8.57k stars 562 forks source link

Upsert API fails with graph config while performing after /delete #435

Closed akset2X closed 1 year ago

akset2X commented 1 year ago

Getting the following weird error when I do upsert operation from the /upsert API. I am doing it with graph configuration.

#config.yml
# Index file path
path: ./tmp/index

# Allow indexing of documents
writable: True

# Enbeddings index
embeddings:
  path: sentence-transformers/all-MiniLM-L6-v2
  content: True
  functions:
  - name: graph
    function: graph.attribute
  expressions:
  - name: category
    expression: graph(indexid, 'category')
  - name: topic
    expression: graph(indexid, 'topic')
  - name: topicrank
    expression: graph(indexid, 'topicrank')
  graph:
    limit: 15
    minscore: 0.1
    topics:
      categories:
      - Society & Culture
      - Science & Mathematics
      - Health
      - Education & Reference
      - Computers & Internet
      - Sports
      - Business & Finance
      - Entertainment & Music
      - Family & Relationships
      - Politics & Government

"/index" API is working fine, but I want to upsert index, so that I can keep my old emeddings. Is there any solution?

INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:57206 - "POST /batchsearch HTTP/1.1" 200 OK INFO: 127.0.0.1:57206 - "POST /delete HTTP/1.1" 200 OK INFO: 127.0.0.1:57206 - "POST /add HTTP/1.1" 200 OK INFO: 127.0.0.1:57206 - "GET /upsert HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi result = await app( # type: ignore[func-returns-value] File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\fastapi\applications.py", line 270, in call await super().call(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\applications.py", line 124, in call await self.middleware_stack(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\middleware\errors.py", line 184, in call raise exc File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\middleware\errors.py", line 162, in call await self.app(scope, receive, _send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\middleware\exceptions.py", line 79, in call raise exc File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\middleware\exceptions.py", line 68, in call await self.app(scope, receive, sender) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call raise e File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\routing.py", line 706, in call await route.handle(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\routing.py", line 276, in handle await self.app(scope, receive, send) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\routing.py", line 66, in app response = await func(request) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\fastapi\routing.py", line 237, in app raw_response = await run_endpoint_function( File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\fastapi\routing.py", line 165, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\starlette\concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\api\routers\embeddings.py", line 85, in upsert application.get().upsert() File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\api\base.py", line 80, in upsert super().upsert() File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\app\base.py", line 400, in upsert self.embeddings.upsert(self.documents) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\embeddings\base.py", line 180, in upsert self.graph.upsert(Search(self, True)) File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\graph\base.py", line 415, in upsert self.infertopics() File "c:\users\ak\appdata\local\programs\python\python39\lib\site-packages\txtai\graph\base.py", line 541, in infertopics topic = Counter(self.attribute(x, "topic") for x in ids).most_common(1)[0][0] TypeError: 'NoneType' object is not iterable

What I think is.. it is related to #421 I think this issue may be caused by /delete API, because when I freshly did /add and /upsert a batch of text content, it worked fine. But when I /delete some of it, then /add and /upsert. I faced the mentioned issue.

davidmezzetti commented 1 year ago

Ok, thank you for the additional context on this and linking to #421.

akset2X commented 1 year ago

Any update here ?

Is there any option to configure topic generation to avoid stopwords such as "has", "you", "yourself" etc or some tagging options to get only proper nouns NN, JJ ? Please let me know, if there is no such option for now, it would be great if we could get those in future.

davidmezzetti commented 1 year ago

Sorry, it's on my list to look at this after the 5.4 release goes out.

davidmezzetti commented 1 year ago

I'll be checking in a fix for this shortly. There was a bug when deleting from the index and topics went to 0.

Regarding stopwords, see the stopwords configuration option: https://neuml.github.io/txtai/embeddings/configuration/#topics