Alternative evaluation matrices beyond F1 score and exact match

Find and/or develop other possible matrices to evaluate different strategies for converting text to RDF statements. Current matrices like F1 score and exact match are not able to match RDF triples semantically. They treat a RDF triple as a string (exact match) or three strings (general F1 score). Consequently, a string from RDF triples can be recognized as correct only if it is exactly the same as the ground truth; otherwise, it is wrong. However, for example, for the concept "Soil Health", some times it was defined as "ex:SoilHealth" for URI, other times as "ex:HealthySoils". Semantically, they are not that different. But for the current metrics, only one of these two definitions is likely to be correct, and any other definitions score no points. This potentially underestimates the performance of zero-shot learning because it is much less likely to consistently define the URI of the concept.

Possible solutions:

[ ] RDF2vec
[ ] Convert RDF statements back to plain text, embed them and compute similarity

soilwise-he / soil-health-knowledge-graph

Alternative evaluation matrices beyond F1 score and exact match #2