okgreece / Alignment

Alignment, a collaborative, system aided, user driven ontology/vocabulary matching and validation platform.
https://alignment.okfn.gr
MIT License
12 stars 1 forks source link

Similarities calculated by Silk differ for the same two projects #41

Closed OndrejZamazal closed 7 years ago

OndrejZamazal commented 7 years ago

I encounter on strange behaviour of calculation of similarities by Silk. I created the same project (with the same project setting and the default configuration setting of Silk) several times and calculated similarities by Silk were sometimes different. Is this possible?

skarampatakis commented 7 years ago

I don't think that this is possible. Could you point on how to reproduce it?

OndrejZamazal commented 7 years ago

This can be seen in comparison of computed similarities for the project LuSe-cz-eu1 and JaZb-cz-eu1. For example, suggested link for Transport. In the first case, the project, LuSe-cz-eu1, has one link (after rerun of calculation similarities there is no link for Transport see below) and in the second case, JaZb-cz-eu1, there are three links for the Transport entity. Those two projects have the same project setting, the same input files and the same default configuration for Silk.

However, these calculated similarities can differ for each run after selection of "calculate similarities". It happens that once there is no calculated similarity for Transport (LuSe-cz-eu1), another time there is just one and another time there are three suggested links.

I also encounter on the situation that that those two projects somehow (accidentally) share the calculated similarities.

skarampatakis commented 7 years ago

It seems that Silk Blocking feature was responsible for this behaviour. I disabled it by default for now. Please check.

I don't understand the last part of your comment. What do you mean by

I also encounter on the situation that that those two projects somehow (accidentally) share the calculated similarities.

?

OndrejZamazal commented 7 years ago

I checked it and it seems that it behaves correctly. Thanks. My comment about shared similarities was just an attempt to explain the behaviour. Now, it is not relevant.

Let me ask one question regarding Silk configuration. Is it possible to change minimum threshold? Now it is set up on 0.3. But this is another issue I guess.

skarampatakis commented 7 years ago

This is related with the ability to change the Silk Configuration or Import custom configurations. I am working on this and I believe I can have a fix by the end of the week. ATM the only way to change configuration is to change the default Silk-LSL file. This is possible on a local installation.

OndrejZamazal commented 7 years ago

Changing Silk configuration would be very helpful. I prepared some test cases for domain experts and I think that our testing by domain experts should start after this new feature is available. Is there any estimation when this is available?

skarampatakis commented 7 years ago

This would be separated in two main use cases.

  1. Users develop a Silk LSL settings file on Silk workbench or manually and then just import it on Alignment. This is the easy part, we can have it today. Silk already provides a nice and user friendly environment for developing such configuration files.

  2. Users develop a Silk LSL settings file from within Alignment or be able to edit it from there and change some features. For instance copy default settings and change stop words or comparison algorithm. This is partially already implemented with some features missing as it was just ported from the initial version of Alignment, about a year ago. It is a bit tricky to achieve and have a user-friendly result.

If 1 can is enough for your use case I will try to have it ready and tested ASAP. For 2 I will need more time to be fully functional.

skarampatakis commented 7 years ago

This is related to #3 and #4.

OndrejZamazal commented 7 years ago

I think that case 1) (import) should be enough for us now. I can work with Alignment and continue with our test case at the end of this week or at the beginning of the next week. Thanks