run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.92k stars 5.29k forks source link

[Bug]: SchemaLLMPathExtractor example code raises AssertionError #16623

Open samgalen opened 1 month ago

samgalen commented 1 month ago

Bug Description

The example code demonstrating how to use SchemaLLMPathExtractor produces an AssertionError when run.

Specifically - it fails the check to see if the kg_schema produced in indices/property_graph/transformations/schema_llm is a KGSchema.

The documentation is a little inconsistent about what the validation_schema input should be, with this link showing a list of tuples, and this link showing a dict. However, after looking at the traceback this does not seem to be the issue.

Version

0.11.17

Steps to Reproduce

Here is the code to reproduce

I am using Ollama

from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama

llm = Ollama(model='llama3.2',
             base_url="http://localhost:11434",
             request_timeout=360.0)

embed_model = OllamaEmbedding(
    model_name="llama3.2",
    base_url="http://localhost:11434",
    ollama_additional_kwargs={"mirostat": 0},
)

Example code from [here]https://docs.llamaindex.ai/en/stable/module_guides/indexing/lpg_index_guide/l)

from typing import Literal
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

# recommended uppercase, underscore separated
entities = Literal["PERSON", "PLACE", "THING"]
relations = Literal["PART_OF", "HAS", "IS_A"]
schema = {
    "PERSON": ["PART_OF", "HAS", "IS_A"],
    "PLACE": ["PART_OF", "HAS"],
    "THING": ["IS_A"],
}

kg_extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    strict=True,  # if false, will allow triplets outside of the schema
    num_workers=4,
    max_triplets_per_chunk=10,
)

Using the example data

from llama_index.core import SimpleDirectoryReader
from llama_index.core import PropertyGraphIndex

documents = SimpleDirectoryReader("../data/paul_graham/").load_data()
index = PropertyGraphIndex.from_documents(documents, 
                                          show_progress=True,
                                          llm=llm,
                                          embed_model=embed_model,
                                          kg_extractors=[kg_extractor])

Relevant Logs/Tracbacks

Extracting paths from text with schema:   0%|          | 0/22 [00:00<?, ?it/s]

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 3
      1 from llama_index.core import PropertyGraphIndex
----> 3 index = PropertyGraphIndex.from_documents(documents, 
      4                                           show_progress=True,
      5                                           llm=llm,
      6                                           embed_model=embed_model,
      7                                           kg_extractors=[kg_extractor])

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/base.py:119, in BaseIndex.from_documents(cls, documents, storage_context, show_progress, callback_manager, transformations, **kwargs)
    110     docstore.set_document_hash(doc.get_doc_id(), doc.hash)
    112 nodes = run_transformations(
    113     documents,  # type: ignore
    114     transformations,
    115     show_progress=show_progress,
    116     **kwargs,
    117 )
--> 119 return cls(
    120     nodes=nodes,
    121     storage_context=storage_context,
    122     callback_manager=callback_manager,
    123     show_progress=show_progress,
    124     transformations=transformations,
    125     **kwargs,
    126 )

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/base.py:134, in PropertyGraphIndex.__init__(self, nodes, llm, kg_extractors, property_graph_store, vector_store, use_async, embed_model, embed_kg_nodes, callback_manager, transformations, storage_context, show_progress, **kwargs)
    128 self._embed_kg_nodes = embed_kg_nodes
    129 self._override_vector_store = (
    130     vector_store is not None
    131     or not storage_context.property_graph_store.supports_vector_queries
    132 )
--> 134 super().__init__(
    135     nodes=nodes,
    136     callback_manager=callback_manager,
    137     storage_context=storage_context,
    138     transformations=transformations,
    139     show_progress=show_progress,
    140     **kwargs,
    141 )

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/base.py:77, in BaseIndex.__init__(self, nodes, objects, index_struct, storage_context, callback_manager, transformations, show_progress, **kwargs)
     75 if index_struct is None:
     76     nodes = nodes or []
---> 77     index_struct = self.build_index_from_nodes(
     78         nodes + objects,  # type: ignore
     79         **kwargs,  # type: ignore
     80     )
     81 self._index_struct = index_struct
     82 self._storage_context.index_store.add_index_struct(self._index_struct)

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/base.py:185, in BaseIndex.build_index_from_nodes(self, nodes, **build_kwargs)
    183 """Build the index from nodes."""
    184 self._docstore.add_documents(nodes, allow_update=True)
--> 185 return self._build_index_from_nodes(nodes, **build_kwargs)

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/base.py:334, in PropertyGraphIndex._build_index_from_nodes(self, nodes, **build_kwargs)
    330 def _build_index_from_nodes(
    331     self, nodes: Optional[Sequence[BaseNode]], **build_kwargs: Any
    332 ) -> IndexLPG:
    333     """Build index from nodes."""
--> 334     nodes = self._insert_nodes(nodes or [])
    336     # this isn't really used or needed
    337     return IndexLPG()

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/base.py:200, in PropertyGraphIndex._insert_nodes(self, nodes)
    198 # run transformations on nodes to extract triplets
    199 if self._use_async:
--> 200     nodes = asyncio.run(
    201         arun_transformations(
    202             nodes, self._kg_extractors, show_progress=self._show_progress
    203         )
    204     )
    205 else:
    206     nodes = run_transformations(
    207         nodes, self._kg_extractors, show_progress=self._show_progress
    208     )

File ~/.local/lib/python3.11/site-packages/nest_asyncio.py:30, in _patch_asyncio.<locals>.run(main, debug)
     28 task = asyncio.ensure_future(main)
     29 try:
---> 30     return loop.run_until_complete(task)
     31 finally:
     32     if not task.done():

File ~/.local/lib/python3.11/site-packages/nest_asyncio.py:98, in _patch_loop.<locals>.run_until_complete(self, future)
     95 if not f.done():
     96     raise RuntimeError(
     97         'Event loop stopped before Future completed.')
---> 98 return f.result()

File /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/asyncio/futures.py:203, in Future.result(self)
    201 self.__log_traceback = False
    202 if self._exception is not None:
--> 203     raise self._exception.with_traceback(self._exception_tb)
    204 return self._result

File /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/asyncio/tasks.py:267, in Task.__step(***failed resolving arguments***)
    263 try:
    264     if exc is None:
    265         # We use the `send` method directly, because coroutines
    266         # don't have `__iter__` and `__next__` methods.
--> 267         result = coro.send(None)
    268     else:
    269         result = coro.throw(exc)

File ~/.local/lib/python3.11/site-packages/llama_index/core/ingestion/pipeline.py:137, in arun_transformations(nodes, transformations, in_place, cache, cache_collection, **kwargs)
    135             cache.put(hash, nodes, collection=cache_collection)
    136     else:
--> 137         nodes = await transform.acall(nodes, **kwargs)
    139 return nodes

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/transformations/schema_llm.py:390, in SchemaLLMPathExtractor.acall(self, nodes, show_progress, **kwargs)
    387 for node in nodes:
    388     jobs.append(self._aextract(node))
--> 390 return await run_jobs(
    391     jobs,
    392     workers=self.num_workers,
    393     show_progress=show_progress,
    394     desc="Extracting paths from text with schema",
    395 )

File ~/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:353, in Dispatcher.span.<locals>.async_wrapper(func, instance, args, kwargs)
    345 self.span_enter(
    346     id_=id_,
    347     bound_args=bound_args,
   (...)
    350     tags=tags,
    351 )
    352 try:
--> 353     result = await func(*args, **kwargs)
    354 except BaseException as e:
    355     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/.local/lib/python3.11/site-packages/llama_index/core/async_utils.py:146, in run_jobs(jobs, show_progress, workers, desc)
    143 if show_progress:
    144     from tqdm.asyncio import tqdm_asyncio
--> 146     results = await tqdm_asyncio.gather(*pool_jobs, desc=desc)
    147 else:
    148     results = await asyncio.gather(*pool_jobs)

File ~/.local/lib/python3.11/site-packages/tqdm/asyncio.py:79, in tqdm_asyncio.gather(cls, loop, timeout, total, *fs, **tqdm_kwargs)
     76     return i, await f
     78 ifs = [wrap_awaitable(i, f) for i, f in enumerate(fs)]
---> 79 res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
     80                                          total=total, **tqdm_kwargs)]
     81 return [i for _, i in sorted(res)]

File ~/.local/lib/python3.11/site-packages/tqdm/asyncio.py:79, in <listcomp>(.0)
     76     return i, await f
     78 ifs = [wrap_awaitable(i, f) for i, f in enumerate(fs)]
---> 79 res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
     80                                          total=total, **tqdm_kwargs)]
     81 return [i for _, i in sorted(res)]

File /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/asyncio/tasks.py:605, in as_completed.<locals>._wait_for_one()
    602 if f is None:
    603     # Dummy value from _on_timeout().
    604     raise exceptions.TimeoutError
--> 605 return f.result()

File /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/asyncio/futures.py:203, in Future.result(self)
    201 self.__log_traceback = False
    202 if self._exception is not None:
--> 203     raise self._exception.with_traceback(self._exception_tb)
    204 return self._result

File /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/asyncio/tasks.py:267, in Task.__step(***failed resolving arguments***)
    263 try:
    264     if exc is None:
    265         # We use the `send` method directly, because coroutines
    266         # don't have `__iter__` and `__next__` methods.
--> 267         result = coro.send(None)
    268     else:
    269         result = coro.throw(exc)

File ~/.local/lib/python3.11/site-packages/tqdm/asyncio.py:76, in tqdm_asyncio.gather.<locals>.wrap_awaitable(i, f)
     75 async def wrap_awaitable(i, f):
---> 76     return i, await f

File ~/.local/lib/python3.11/site-packages/llama_index/core/instrumentation/dispatcher.py:353, in Dispatcher.span.<locals>.async_wrapper(func, instance, args, kwargs)
    345 self.span_enter(
    346     id_=id_,
    347     bound_args=bound_args,
   (...)
    350     tags=tags,
    351 )
    352 try:
--> 353     result = await func(*args, **kwargs)
    354 except BaseException as e:
    355     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/.local/lib/python3.11/site-packages/llama_index/core/async_utils.py:139, in run_jobs.<locals>.worker(job)
    136 @dispatcher.span
    137 async def worker(job: Coroutine) -> Any:
    138     async with semaphore:
--> 139         return await job

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/transformations/schema_llm.py:360, in SchemaLLMPathExtractor._aextract(self, node)
    353 try:
    354     kg_schema = await self.llm.astructured_predict(
    355         self.kg_schema_cls,
    356         self.extract_prompt,
    357         text=text,
    358         max_triplets_per_chunk=self.max_triplets_per_chunk,
    359     )
--> 360     triplets = self._prune_invalid_triplets(kg_schema)
    361 except ValueError:
    362     triplets = []

File ~/.local/lib/python3.11/site-packages/llama_index/core/indices/property_graph/transformations/schema_llm.py:272, in SchemaLLMPathExtractor._prune_invalid_triplets(self, kg_schema)
    270 def _prune_invalid_triplets(self, kg_schema: Any) -> Sequence[Triplet]:
    271     """Prune invalid triplets."""
--> 272     assert isinstance(kg_schema, self.kg_schema_cls)
    274     valid_triplets = []
    275     for triplet in kg_schema.triplets:

AssertionError:
dosubot[bot] commented 1 month ago

@samgalen Hello there! I'm here to help you with any bugs, questions, or contributions. Let me know how I can assist you while you wait for a human maintainer.

The AssertionError you're encountering in the SchemaLLMPathExtractor is likely due to a mismatch between the kg_schema produced and the expected class type defined by kg_schema_cls. This error occurs because the kg_schema is not an instance of the dynamically created schema class (kg_schema_cls) that the extractor expects.

To resolve this issue, ensure that the kg_validation_schema is correctly defined. The recommended format for kg_validation_schema is a list of tuples, as this is the format encouraged for future use [1]. The dictionary format is backward-compatible but may not align with the current implementation's expectations.

Here's what you can do:

  1. Use the List of Tuples Format: Change your kg_validation_schema to a list of tuples. For example:

    schema = [
       ("PERSON", "PART_OF", "PLACE"),
       ("PERSON", "HAS", "THING"),
       ("THING", "IS_A", "PLACE"),
    ]
  2. Check the Schema Class: Ensure that the kg_schema_cls is correctly constructed if you are not providing it explicitly. The code should dynamically create this class based on the entities and relations you define.

  3. Review the Documentation: Make sure that the example code you are following aligns with the latest documentation and codebase updates.

By ensuring that the kg_validation_schema is in the correct format and that the schema class is properly constructed, you should be able to avoid the AssertionError.

To continue talking to Dosu, mention @dosu.

samgalen commented 1 month ago

Changing to tuples produces the same error.

logan-markewich commented 1 month ago

Seems like the LLM isn't outputting the expected class. probably that could raise a better error, but its kind of expected with open-source LLMs

Try the DyanmicLLMPathExtractor maybe? It doesn't rely on structured outputs like the schema extractor does

samgalen commented 1 month ago

@logan-markewich yeah so replacing with DynamicLLMPathExtractor doesn't cause this error, however for my usecase we were kind of hoping to get a more restricted set of relationships.

To clarify - in the following traceback lines from the property_graph/transformations/schema_llm.py, this is asking the llm to output a schema distinct from the one specified from the kg_validation_schema argument?

    354     kg_schema = await self.llm.astructured_predict(
    355         self.kg_schema_cls,
    356         self.extract_prompt,
    357         text=text,
    358         max_triplets_per_chunk=self.max_triplets_per_chunk,
    359     )
nomisto commented 1 month ago

I've encountered the same error using llama3.1 and mistral-nemo. I got SchemaLLMPathExtractor only to work with openai gpt.

I investigated a little bit, but I am quite unfamiliar with the code.

The assertion gets raised because kg_schema isn't of type self.kg_schema_cls but is a string containing an exception list indices must be integers or slices, not str.

Investigating further I guess this is due to the fact that other language models than gpt cannot handle the complex pydantic prompt that is used for SchemaLLMPathExtractor (See below, complex json schema function definition with "$defs" etc).

E.g. output using mistral-nemo (which does not adhere to the schema)

{'model': 'mistral-nemo', 'created_at': '2024-10-25T12:11:30.352000481Z', 'message': {'role': 'assistant', 'content': '', 'tool_calls': [{'function': {'name': 'KGSchema', 'arguments': {'triplets': [['Paul Graham', 'wrote', 'Essay'], ['Essay', 'contains', 'text']]}}}]}, 'done_reason': 'stop', 'done': True, 'total_duration': 5230632717, 'load_duration': 62419239, 'prompt_eval_count': 1091, 'prompt_eval_duration': 72113000, 'eval_count': 43, 'eval_duration': 5052451000}

Compared with output using chatgpt:

ChatCompletion(id='chatcmpl-ALuEpAb6G0UoOLzuc7F81XxHOdYZf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_4CRF7z285XuGxSdR8yMGL46w', function=Function(arguments='{"triplets":[{"subject":{"type":"PERSON","name":"Paul Graham"},"relation":{"type":"WORKED_ON"},"object":{"type":"ORGANIZATION","name":"Viaweb"}},{"subject":{"type":"ORGANIZATION","name":"Viaweb"},"relation":{"type":"WORKED_AT"},"object":{"type":"ORGANIZATION","name":"Yahoo"}},{"subject":{"type":"PERSON","name":"Paul Graham"},"relation":{"type":"WORKED_WITH"},"object":{"type":"PERSON","name":"Robert"}},{"subject":{"type":"PERSON","name":"Paul Graham"},"relation":{"type":"WORKED_WITH"},"object":{"type":"PERSON","name":"Trevor"}},{"subject":{"type":"ORGANIZATION","name":"Viaweb"},"relation":{"type":"PART_OF"},"object":{"type":"ORGANIZATION","name":"Yahoo"}},{"subject":{"type":"ORGANIZATION","name":"Yahoo"},"relation":{"type":"PART_OF"},"object":{"type":"ORGANIZATION","name":"Yahoo"}},{"subject":{"type":"PERSON","name":"Paul Graham"},"relation":{"type":"HAS"},"object":{"type":"ORGANIZATION","name":"Viaweb"}}]}', name='KGSchema'), type='function')]))], created=1729784431, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=243, prompt_tokens=1272, total_tokens=1515, completion_tokens_details=CompletionTokensDetails(audio_tokens=None, reasoning_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

I.e. mistral-nemo passes a triple argument as a list of tuples, while chatgpt does the subtyping with subject.type, subject.name etc...

The request was for both

{'messages': [{'content': 'Give the following text, extract the knowledge '
                          'graph according to the provided schema. Try to '
                          'limit to the output 10 extracted paths.s\n'
                          '-------\n'
                          "So I tried to paint, but I just didn't seem to have "
                          'any energy or ambition. Part of the problem was '
                          "that I didn't know many people in California. I'd "
                          'compounded this problem by buying a house up in the '
                          'Santa Cruz Mountains, with a beautiful view but '
                          'miles from anywhere. I stuck it out for a few more '
                          'months, then in desperation I went back to New '
                          'York, where unless you understand about rent '
                          "control you'll be surprised to hear I still had my "
                          'apartment, sealed up like a tomb of my old life. '
                          'Idelle was in New York at least, and there were '
                          'other people trying to paint there, even though I '
                          "didn't know any of them.\n"
                          '\n'
                          'When I got back to New York I resumed my old life, '
                          'except now I was rich. It was as weird as it '
                          'sounds. I resumed all my old patterns, except now '
                          "there were doors where there hadn't been. Now when "
                          'I was tired of walking, all I had to do was raise '
                          'my hand, and (unless it was raining) a taxi would '
                          'stop to pick me up. Now when I walked past charming '
                          'little restaurants I could go in and order lunch. '
                          'It was exciting for a while. Painting started to go '
                          'better. I experimented with a new kind of still '
                          "life where I'd paint one painting in the old way, "
                          'then photograph it and print it, blown up, on '
                          'canvas, and then use that as the underpainting for '
                          'a second still life, painted from the same objects '
                          "(which hopefully hadn't rotted yet).\n"
                          '\n'
                          'Meanwhile I looked for an apartment to buy. Now I '
                          'could actually choose what neighborhood to live in. '
                          'Where, I asked myself and various real estate '
                          'agents, is the Cambridge of New York? Aided by '
                          'occasional visits to actual Cambridge, I gradually '
                          "realized there wasn't one. Huh.\n"
                          '\n'
                          'Around this time, in the spring of 2000, I had an '
                          'idea. It was clear from our experience with Viaweb '
                          'that web apps were the future. Why not build a web '
                          'app for making web apps? Why not let people edit '
                          'code on our server through the browser, and then '
                          'host the resulting applications for them? [9] You '
                          'could run all sorts of services on the servers that '
                          'these applications could use just by making an API '
                          'call: making and receiving phone calls, '
                          'manipulating images, taking credit card payments, '
                          'etc.\n'
                          '\n'
                          "I got so excited about this idea that I couldn't "
                          'think about anything else. It seemed obvious that '
                          "this was the future. I didn't particularly want to "
                          'start another company, but it was clear that this '
                          'idea would have to be embodied as one, so I decided '
                          'to move to Cambridge and start it. I hoped to lure '
                          'Robert into working on it with me, but there I ran '
                          'into a hitch. Robert was now a postdoc at MIT, and '
                          "though he'd made a lot of money the last time I'd "
                          'lured him into working on one of my schemes, it had '
                          'also been a huge time sink. So while he agreed that '
                          'it sounded like a plausible idea, he firmly refused '
                          'to work on it.\n'
                          '\n'
                          "Hmph. Well, I'd do it myself then. I recruited Dan "
                          'Giffin, who had worked for Viaweb, and two '
                          'undergrads who wanted summer jobs, and we got to '
                          "work trying to build what it's now clear is about "
                          'twenty companies and several open source projects '
                          'worth of software. The language for defining '
                          'applications would of course be a dialect of Lisp. '
                          "But I wasn't so naive as to assume I could spring "
                          "an overt Lisp on a general audience; we'd hide the "
                          'parentheses, like Dylan did.\n'
                          '\n'
                          'By then there was a name for the kind of company '
                          'Viaweb was, an "application service provider," or '
                          "ASP. This name didn't last long before it was "
                          'replaced by "software as a service," but it was '
                          'current for long enough that I named this new '
                          'company after it: it was going to be called Aspra.\n'
                          '\n'
                          'I started working on the application builder, Dan '
                          'worked on network infrastructure, and the two '
                          'undergrads worked on the first two services (images '
                          'and phone calls). But about halfway through the '
                          "summer I realized I really didn't want to run a "
                          'company — especially not a big one, which it was '
                          "looking like this would have to be. I'd only "
                          'started Viaweb because I needed the money. Now that '
                          "I didn't need money anymore, why was I doing this? "
                          'If this vision had to be realized as a company, '
                          "then screw the vision. I'd build a subset that "
                          'could be done as an open source project.\n'
                          '\n'
                          'Much to my surprise, the time I spent working on '
                          'this stuff was not wasted after all.\n'
                          '-------\n',
               'role': 'user'}],
 'tools': [{'function': {'description': 'Knowledge Graph Schema.',
                         'name': 'KGSchema',
                         'parameters': {'$defs': {'Entity': {'properties': {'name': {'title': 'Name',
                                                                                     'type': 'string'},
                                                                            'type': {'description': 'Entity '
                                                                                                    'in '
                                                                                                    'a '
                                                                                                    'knowledge '
                                                                                                    'graph. '
                                                                                                    'Only '
                                                                                                    'extract '
                                                                                                    'entities '
                                                                                                    'with '
                                                                                                    'types '
                                                                                                    'that '
                                                                                                    'are '
                                                                                                    'listed '
                                                                                                    'as '
                                                                                                    'valid: '
                                                                                                    "typing.Literal['PERSON', "
                                                                                                    "'PLACE', "
                                                                                                    "'ORGANIZATION']",
                                                                                     'enum': ['PERSON',
                                                                                              'PLACE',
                                                                                              'ORGANIZATION'],
                                                                                     'title': 'Type',
                                                                                     'type': 'string'}},
                                                             'required': ['type',
                                                                          'name'],
                                                             'title': 'Entity',
                                                             'type': 'object'},
                                                  'Relation': {'properties': {'type': {'description': 'Relation '
                                                                                                      'in '
                                                                                                      'a '
                                                                                                      'knowledge '
                                                                                                      'graph. '
                                                                                                      'Only '
                                                                                                      'extract '
                                                                                                      'relations '
                                                                                                      'with '
                                                                                                      'types '
                                                                                                      'that '
                                                                                                      'are '
                                                                                                      'listed '
                                                                                                      'as '
                                                                                                      'valid: '
                                                                                                      "typing.Literal['HAS', "
                                                                                                      "'PART_OF', "
                                                                                                      "'WORKED_ON', "
                                                                                                      "'WORKED_WITH', "
                                                                                                      "'WORKED_AT']",
                                                                                       'enum': ['HAS',
                                                                                                'PART_OF',
                                                                                                'WORKED_ON',
                                                                                                'WORKED_WITH',
                                                                                                'WORKED_AT'],
                                                                                       'title': 'Type',
                                                                                       'type': 'string'}},
                                                               'required': ['type'],
                                                               'title': 'Relation',
                                                               'type': 'object'},
                                                  'Triplet': {'properties': {'object': {'$ref': '#/$defs/Entity'},
                                                                             'relation': {'$ref': '#/$defs/Relation'},
                                                                             'subject': {'$ref': '#/$defs/Entity'}},
                                                              'required': ['subject',
                                                                           'relation',
                                                                           'object'],
                                                              'title': 'Triplet',
                                                              'type': 'object'}},
                                        'additionalProperties': False,
                                        'properties': {'triplets': {'items': {'$ref': '#/$defs/Triplet'},
                                                                    'title': 'Triplets',
                                                                    'type': 'array'}},
                                        'required': ['triplets'],
                                        'type': 'object'},
                         'strict': False},
            'type': 'function'}]}

So bottom line: I don't think that it will work with a model other than chatgpt. However I think this error should be catched and a better error description should be printed.

samgalen commented 3 weeks ago

@nomisto Thanks that's helpful to know. Since this seems like a fundamental aspect to how the SchemaLLMPathExtractor functions, I agree that the error should be caught with a more descriptive exception.

From what I understand, this would not be terribly complex to add (at least if we're just talking about SchemaLLMPathExtractor). I'm happy to open a PR if there's agreement about how to handle this.