langchain-ai / langchain

πŸ¦œπŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.37k stars 14.77k forks source link

langchain.graph RDFGraph does not read .owl extension files #13115

Closed will23332 closed 10 months ago

will23332 commented 10 months ago

System Info

Langchain version: 0.0.332 Python version: 3.11.5

Who can help?

@hwchase17

When loading a local .owl file (the standard example pizza.owl) the operation breaks and gives the following error for all the URI: does not look like a valid URI, trying to serialize this will break.

Here's the traceback

Traceback (most recent call last):

  File ~\AppData\Roaming\Python\Python311\site-packages\IPython\core\interactiveshell.py:3526 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[13], line 4
    graph = RdfGraph(

  File C:\Python311\Lib\site-packages\langchain\graphs\rdf_graph.py:159 in __init__
    self.graph.parse(source_file, format=self.serialization)

  File C:\Python311\Lib\site-packages\rdflib\graph.py:1501 in parse
    raise se

  File C:\Python311\Lib\site-packages\rdflib\graph.py:1492 in parse
    parser.parse(source, self, **args)

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:2021 in parse
    p.loadStream(stream)

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:479 in loadStream
    return self.loadBuf(stream.read())  # Not ideal

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:485 in loadBuf
    self.feed(buf)

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:511 in feed
    i = self.directiveOrStatement(s, j)

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:532 in directiveOrStatement
    return self.checkDot(argstr, j)

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:1214 in checkDot
    self.BadSyntax(argstr, j, "expected '.' or '}' or ']' at end of statement")

  File C:\Python311\Lib\site-packages\rdflib\plugins\parsers\notation3.py:1730 in BadSyntax
    raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg)

  File <string>
BadSyntax

Information

Related Components

Reproduction

Steps to reproduce the behaviour:

  1. Get the source file from : https://protege.stanford.edu/ontologies/pizza/pizza.owl and place it where the code runs
  2. Use the following code:
    
    from langchain.chains import GraphSparqlQAChain
    from langchain.graphs import RdfGraph

graph = RdfGraph( source_file="pizza.owl", standard="owl" )

graph.load_schema()

print(graph.get_schema)



### Expected behavior

For the graph to load and for graph.get_schema to show the classes and object properties.
dosubot[bot] commented 10 months ago

πŸ€–

Hello @will23332,

Nice to meet you! I'm Dosu, an AI bot here to help you solve bugs, answer questions, and guide you in contributing to the LangChain repository. While we wait for a human maintainer, feel free to use me as a resource. I'm currently looking at the issue you've posted and will provide a full answer shortly.

dosubot[bot] commented 10 months ago

πŸ€–

Based on the traceback you provided, it seems like the issue is related to the serialization format used when initializing the RdfGraph class. In your code, the default serialization format is set to "ttl" (Turtle), which might not be compatible with the .owl file format.

To resolve this issue, you should specify the correct serialization format when initializing the RdfGraph class. The .owl files are usually serialized in RDF/XML format, so you should try setting the serialization parameter to "xml" during the initialization. Here is how you can do it:

graph = RdfGraph(
    source_file="pizza.owl",
    serialization="xml",
    standard="owl"
)

This will instruct the rdflib library to parse the source file as an RDF/XML document, which should be compatible with the .owl file format. If the pizza.owl file is serialized in a different format, you should adjust the serialization parameter accordingly.

If the issue persists, it might be due to the content of the pizza.owl file itself. In that case, you would need to check if the file is correctly formatted and doesn't contain any syntax errors.

Additionally, I found a similar issue in the LangChain repository: langchain.graph RDFGraph does not read OWL . The solution provided there was to modify the load_schema method of the RdfGraph class to use the correct queries (op_query_owl and dp_query_owl) to fetch object properties and data properties, respectively, when the standard is set to "owl". You might want to check if this solution applies to your case as well.

I hope this helps! If you have any further questions or if the issue persists, please provide more details so I can assist you better.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

will23332 commented 10 months ago

Yes, that was the problem, I can confirm it works fine now!