monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES
https://monarch-initiative.github.io/ontogpt/
BSD 3-Clause "New" or "Revised" License
603 stars 75 forks source link

NER eval encounters `AttributeError` when predicted object is `None` #253

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

On a run of the EvalCTDNER:

Democratic Republic of the Congo; World Health Organization; hospitals; health centers; hemoglobin (Hb)
INFO:root:PARSING LINE: chemicals:
DEBUG:root:  FIELD: chemicals
ERROR:root:Line 'artesunate' does not contain a colon; ignoring
DEBUG:root:RAW: None
DEBUG:root:Grounding annotation object None
ERROR:root:Cannot ground None annotation, cls=ChemicalToDiseaseDocument
Traceback (most recent call last):
  File "/home/harry/ontogpt/.venv/bin/ontogpt", line 6, in <module>
    sys.exit(main())
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 1484, in eval
    eos = evaluator.eval()
  File "/home/harry/ontogpt/src/ontogpt/evaluation/ctd/eval_ctd_ner.py", line 309, in eval
    predicted_obj.chemicals = [t for t in predicted_obj.chemicals if included(t)]
AttributeError: 'NoneType' object has no attribute 'chemicals'

The predicted_obj is None, which means nothing got predicted...or it wasn't assigned, which shouldn't happen. Luckily this is rare, but unluckily the error isn't caught.

caufieldjh commented 1 year ago

For reference, this happened because the raw text reponse from the LLM looked like:

 chemicals:

artesunate

diseases:

delayed hemolytic anemia; severe malaria

entities:

Democratic Republic of the Congo; World Health Organization; hospitals; health centers; hemoglobin (Hb)

so it didn't get parsed properly.