monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
548 stars 68 forks source link

Failure to parse markdown formatting leads to errors #391

Closed caufieldjh closed 3 weeks ago

caufieldjh commented 1 month ago

When the three backticks indicating the start of a markdown block are present in completion output, the output parser does not know what to do with it (i.e., it can't find a colon so it ignores the line, but doesn't continue to the next line properly). This could indicate some other parsing bugs, but for now it looks like this (without the escaping on line 4):

INFO:ontogpt.clients.openai_client:Storing payload of len: 1412
INFO:root:RAW TEXT: ```
...output text...
\```
ERROR:root:Line '```' does not contain a colon; ignoring
DEBUG:root:RAW: None
DEBUG:root:Grounding annotation object None
ERROR:root:Cannot ground None annotation, cls=Document
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
caufieldjh commented 1 month ago

This also happens if the response is raw JSON:

ERROR:root:Line '{' does not contain a colon; ignoring