Open wbcbugfree opened 5 months ago
Before implementing the semantic evaluation described above, we should first evaluate the RDF triples separately, i.e., divide them into subjects, predicates and objects and evaluate the performance separately. This will help us better understand the performance of LLMs on different subtasks (entity extraction, relation extraction, and text classification).
Find and/or develop other possible matrices to evaluate different strategies for converting text to RDF statements. Current matrices like F1 score and exact match are not able to match RDF triples semantically. They treat a RDF triple as a string (exact match) or three strings (general F1 score). Consequently, a string from RDF triples can be recognized as correct only if it is exactly the same as the ground truth; otherwise, it is wrong. However, for example, for the concept "Soil Health", some times it was defined as "ex:SoilHealth" for URI, other times as "ex:HealthySoils". Semantically, they are not that different. But for the current metrics, only one of these two definitions is likely to be correct, and any other definitions score no points. This potentially underestimates the performance of zero-shot learning because it is much less likely to consistently define the URI of the concept.
Possible solutions: