Closed dosumis closed 9 months ago
CC @hkir-dev
Encountered an uncaught TypeError in the process of reproducing this, so I'm going to consider it part of the same issue:
Traceback (most recent call last):
File "/home/harry/ontogpt/.venv/bin/ontogpt", line 6, in <module>
sys.exit(main())
File "/home/harry/ontogpt/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/harry/ontogpt/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/harry/ontogpt/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/harry/ontogpt/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/harry/ontogpt/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/harry/ontogpt/src/ontogpt/cli.py", line 355, in extract
write_extraction(results, output, output_format, ke)
File "/home/harry/ontogpt/src/ontogpt/cli.py", line 104, in write_extraction
exporter.export(results, output, knowledge_engine.schemaview)
File "/home/harry/ontogpt/src/ontogpt/io/owl_exporter.py", line 52, in export
output.write(str(doc).encode("utf-8")) # type: ignore
File "/usr/lib/python3.10/codecs.py", line 377, in write
data, consumed = self.encode(object, self.errors)
TypeError: utf_8_encode() argument 1 must be str, not bytes
I think that TypeError was the main problem. Here's the output OWL now:
Prefix( owl: = <http://www.w3.org/2002/07/owl#> )
Prefix( rdf: = <http://www.w3.org/1999/02/22-rdf-syntax-ns#> )
Prefix( rdfs: = <http://www.w3.org/2000/01/rdf-schema#> )
Prefix( xsd: = <http://www.w3.org/2001/XMLSchema#> )
Prefix( xml: = <http://www.w3.org/XML/1998/namespace> )
Prefix( linkml: = <https://w3id.org/linkml/> )
Prefix( gocam: = <http://w3id.org/ontogpt/gocam/> )
Prefix( GO: = <http://purl.obolibrary.org/obo/GO_> )
Prefix( CL: = <http://purl.obolibrary.org/obo/CL_> )
Prefix( core: = <http://w3id.org/ontogpt/core/> )
Prefix( NCIT: = <http://purl.obolibrary.org/obo/NCIT_> )
Prefix( RO: = <http://purl.obolibrary.org/obo/RO_> )
Prefix( shex: = <http://www.w3.org/ns/shex#> )
Prefix( schema: = <http://schema.org/> )
Ontology( <http://w3id.org/ontogpt/gocam>
AnnotationAssertion( rdfs:label HGNC:10798 "SFTPA1" )
AnnotationAssertion( rdfs:label HGNC:10799 "SFTPA2" )
AnnotationAssertion( rdfs:label HGNC:10801 "SFTPB" )
AnnotationAssertion( rdfs:label HGNC:10802 "SFTPC" )
AnnotationAssertion( rdfs:label HGNC:10803 "SFTPD" )
AnnotationAssertion( rdfs:label HGNC:33 "ABCA3" )
AnnotationAssertion( rdfs:label HGNC:14582 "LAMP3" )
AnnotationAssertion( rdfs:label <http://purl.obolibrary.org/obo/GO_0009058> "synthesis" )
AnnotationAssertion( rdfs:label <http://purl.obolibrary.org/obo/GO_0046903> "secretion" )
AnnotationAssertion( rdfs:label <http://purl.obolibrary.org/obo/GO_0015914> "phospholipid transport" )
AnnotationAssertion( rdfs:label <http://purl.obolibrary.org/obo/GO_0051235> "storage" )
AnnotationAssertion( rdfs:label <AUTO:regulation%20of%20surfactant%20metabolism%20and%20innate%20immunity> "regulation of surfactant metabolism and innate immunity" )
AnnotationAssertion( rdfs:label <AUTO:lowering%20surface%20tension%20in%20the%20alveoli%20and%20essential%20for%20normal%20respiratory%20function> "lowering surface tension in the alveoli and essential for normal respiratory function" )
AnnotationAssertion( rdfs:label <AUTO:spreading%20and%20stability%20of%20the%20surfactant%20film%20at%20the%20air-liquid%20interface%20of%20the%20alveolar%20surface> "spreading and stability of the surfactant film at the air-liquid interface of the alveolar surface" )
AnnotationAssertion( rdfs:label <AUTO:immune%20defense%20of%20the%20lungs%20and%20also%20plays%20a%20role%20in%20surfactant%20homeostasis> "immune defense of the lungs and also plays a role in surfactant homeostasis" )
AnnotationAssertion( rdfs:label <AUTO:transports%20phospholipids%20into%20lamellar%20bodies%20in%20AT2%20cells> "transports phospholipids into lamellar bodies in AT2 cells" )
AnnotationAssertion( rdfs:label <AUTO:involved%20in%20the%20storage%20and%20secretion%20of%20surfactant%20lipids%20and%20proteins%20from%20lamellar%20bodies> "involved in the storage and secretion of surfactant lipids and proteins from lamellar bodies" )
AnnotationAssertion( rdfs:label <AUTO:synthesis%20of%20pulmonary%20surfactant> "synthesis of pulmonary surfactant" )
AnnotationAssertion( rdfs:label <AUTO:secretion%20of%20pulmonary%20surfactant> "secretion of pulmonary surfactant" )
AnnotationAssertion( rdfs:label <AUTO:storage%20of%20surfactant%20lipids%20and%20proteins> "storage of surfactant lipids and proteins" )
AnnotationAssertion( rdfs:label <AUTO:lamellar%20bodies%20in%20AT2%20cells> "lamellar bodies in AT2 cells" )
AnnotationAssertion( rdfs:label <http://purl.obolibrary.org/obo/GO_0042599> "lamellar bodies" )
)
The remaining warnings about unrecognized prefixes are likely because they aren't defined in the gocam schema (and others). I'll fix that in its own PR. Otherwise the OWL should write as expected.
Awesome. Thanks for being so quick to fix!
Example:
ontogpt extract --model gpt-4 -t gocam -i AT2_pulmonary_surfactant_response.txt -o AT2_pulmonary_surf_gocam.owl -O owl
Fails with:
Default (YAML) output works fine.
Files: AT2_pulmonary_surfactant_response.txt