jjakubassa / WDI-Project

0 stars 0 forks source link

Precision, recall and f1 = 0, but should be greater 0 #10

Closed jjakubassa closed 10 months ago

jjakubassa commented 11 months ago

In the gold standard there is one true match and this correspondence is found, but all the evaluation metrics are zero. There are only two albums in the minimal dataset for spotify and one for musicbrainz.

[INFO ] 2023-11-08 19:30:44.590 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - * Loading datasets    *
[INFO ] 2023-11-08 19:30:44.640 [de.uni_mannheim.informatik.dws.winter.model.io.XMLMatchableReader] - Loading 2 elements from spotify_min.xml
[INFO ] 2023-11-08 19:30:44.654 [de.uni_mannheim.informatik.dws.winter.model.io.XMLMatchableReader] - Loading 1 elements from MB_min.xml
[INFO ] 2023-11-08 19:30:44.656 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - * Loading gold standard   *
[INFO ] 2023-11-08 19:30:44.658 [de.uni_mannheim.informatik.dws.winter.model.MatchingGoldStandard] - The gold standard has 2 examples
[INFO ] 2023-11-08 19:30:44.659 [de.uni_mannheim.informatik.dws.winter.model.MatchingGoldStandard] -    0 positive examples (0,00%)
[INFO ] 2023-11-08 19:30:44.660 [de.uni_mannheim.informatik.dws.winter.model.MatchingGoldStandard] -    2 negative examples (100,00%)
[INFO ] 2023-11-08 19:30:44.664 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - * Running identity resolution *
[INFO ] 2023-11-08 19:30:44.669 [de.uni_mannheim.informatik.dws.winter.matching.algorithms.RuleBasedMatchingAlgorithm] - Starting Identity Resolution
[INFO ] 2023-11-08 19:30:44.669 [de.uni_mannheim.informatik.dws.winter.matching.algorithms.RuleBasedMatchingAlgorithm] - Blocking 1 x 2 elements
[INFO ] 2023-11-08 19:30:44.673 [de.uni_mannheim.informatik.dws.winter.matching.blockers.StandardBlocker] - created 1 blocking keys for first dataset
[INFO ] 2023-11-08 19:30:44.673 [de.uni_mannheim.informatik.dws.winter.matching.blockers.StandardBlocker] - created 2 blocking keys for second dataset
[INFO ] 2023-11-08 19:30:44.673 [de.uni_mannheim.informatik.dws.winter.matching.blockers.StandardBlocker] - created 1 blocks from blocking keys
[INFO ] 2023-11-08 19:30:44.679 [de.uni_mannheim.informatik.dws.winter.matching.blockers.AbstractBlocker] - Debug results written to file: data/output/debugResultsBlocking.csv
[INFO ] 2023-11-08 19:30:44.684 [de.uni_mannheim.informatik.dws.winter.matching.algorithms.RuleBasedMatchingAlgorithm] - Matching 1 x 2 elements after 0:00:00.011; 1 blocked pairs (reduction ratio: 0.5)
[INFO ] 2023-11-08 19:30:44.688 [de.uni_mannheim.informatik.dws.winter.matching.algorithms.RuleBasedMatchingAlgorithm] - Identity Resolution finished after 0:00:00.019; found 1 correspondences.
[WARN ] 2023-11-08 19:30:44.688 [de.uni_mannheim.informatik.dws.winter.matching.rules.MatchingRule] - No corresponding record for the Debug Log found in the Goldstandard!
[WARN ] 2023-11-08 19:30:44.688 [de.uni_mannheim.informatik.dws.winter.matching.rules.MatchingRule] - Please align the order of Data Sets in Goldstandard and Matching Rule!
[INFO ] 2023-11-08 19:30:44.690 [de.uni_mannheim.informatik.dws.winter.matching.rules.MatchingRule] - Debug results written to file: data/output/debugResultsMatchingRule.csv
[INFO ] 2023-11-08 19:30:44.691 [de.uni_mannheim.informatik.dws.winter.matching.rules.MatchingRule] - Debug results written to file: data/output/debugResultsMatchingRule.csv_short
[INFO ] 2023-11-08 19:30:44.695 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - * Evaluating result   *
[INFO ] 2023-11-08 19:30:44.696 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - Precision: 0,0000
[INFO ] 2023-11-08 19:30:44.696 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - Recall: 0,0000
[INFO ] 2023-11-08 19:30:44.696 [de.uni_mannheim.informatik.dws.wdi.ExerciseIdentityResolution.IR_using_linear_combination] - F1: 0,0000
jjakubassa commented 11 months ago

The solution is to wrap the values in the gold standard csv files into quotation marks, as in https://github.com/jjakubassa/WDI-Project/commit/83a9b4bc890c0bb6222df1cce5190288c752b543.

jjakubassa commented 10 months ago

Same happened with WDC and Spotify, but this time the underlying problem was that the wrong dataset was loaded. Fixed in https://github.com/jjakubassa/WDI-Project/commit/41a9fe9588065f0143bf2ee95daa68ec286681e0.