Closed florisheijmans closed 3 years ago
Hi,
The problem indeed occurs because of the jgrapht update. The ' " ' characters are by default trimmed by the updated ConnectivityInspector and thus the ids are not recognized as existing keys when processed by the gt reader. We will have a more detailed look on this. To resolve this issue for now you can remove those characters from the DPLP input file, or of course use a generally modified format.
Best regards, Manos
From: florisheijmans notifications@github.com Sent: Thursday, January 14, 2021 5:11 PM To: scify/JedAIToolkit JedAIToolkit@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [scify/JedAIToolkit] GtCSVReader problems with jgrapht ConnectivityInspector (#44)
This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.
During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.
For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs.
Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/scify/JedAIToolkit/issues/44, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEVOMDYDXNAD4UCSR4JJ3BLSZ4JSDANCNFSM4WCVBWCA.
Thank you! That fixes the problem.
This issue arose when I attempted to reproduce the workflow in: org.scify.jedai.demoworkflows.CsvDblpAcm.java.
During the reading process of the ground truths in DBLP-ACM_perfectMapping.csv (specifically the GtCSVReader.getDuplicatePairs method), the detection of connected components by the jgrapht package seems to not work.
For some reason I obtain a single cluster of size 2225 and then 5375 more clusters of size 1, which is obviously incorrect since the csv contains about 2225 unique pairs (which should in turn produce 2225 clusters of size 2).
Have you seen this problem before? Maybe the jgrapht package expects a different format than it did previously?