kermitt2 / entity-fishing

A machine learning tool for fishing entities
http://nerd.readthedocs.io/
Apache License 2.0
239 stars 24 forks source link

Different results on several entity-fishing APIs #129

Open aa303554 opened 3 years ago

aa303554 commented 3 years ago

I am currently using 3 different "versions" of entity-fishing (2 online and one on a local server) and I get 3 different results.

On the science-miner online API (https://cloud.science-miner.com/nerd/)

image

On the huma-num online API (http://nerd.huma-num.fr/nerd/)

image

On a local version

image

As you can see on the 3 images above, I have different results.

On the local version, I have the same result as on the science-miner online API, but I don't have the domains.

On the huma-num version, the TYPE is always present while on the 2 other versions it is not present when a link is found with wikipedia.

I wanted to know if there are display options to have the same result as huma-num or if it is a modified version of entity-fishing ?

And why on the version I have installed locally I have the same results as on your online API, but without the domains. Is there an option to display the domains ?

kermitt2 commented 3 years ago

Hello !

I think Huma-num version is 0.0.2 or 0.0.3, not sure, I am involved in it. This older version was keeping the NE type after wikidata successful disambiguation, which was then removed in more recent version (see #126). There is no plan to put NE types again in case of successful disambiguation of a named entity, because it was looking often bad and inconsistent. It's possible to access much richer type information with Wikidata ID via the statements. One option would be to infer the NE type from the statements.

About the domain missing in the local version, it's a bit strange. Do you have this local database with the following size:

lopez@work:~$ ll entity-fishing/data/db/db-en/domains/
total 152M
drwxrwxr-x  2 lopez lopez 4.0K Jun 12  2020 ./
drwxrwxr-x 24 lopez lopez 4.0K Sep 24  2020 ../
-rw-r--r--  1 lopez lopez 152M Jun 12  2020 data.mdb
-rw-r--r--  1 lopez lopez 8.0K Apr 27 22:41 lock.mdb

if not, I will need to check the uploaded db dump.

aa303554 commented 3 years ago

yes I have data.mdb at 152M and lock.mdb at 8.0K with a total of 152M

kermitt2 commented 3 years ago

mmm any error message at launch for your local version? Can you click on the Response tab and check if "domains" appears in the JSON response?

I have the domains information for this example with the current master and my local version:

Screenshot from 2021-06-03 09-35-21

aa303554 commented 3 years ago

no error message only a warning "scanned from multiple locations" and in the Response tab there are no domains. I also noticed that entity-fishing does not work with java 11.

aa303554 commented 3 years ago

I solved my problem, the part of the installation where you use this command: ./gradlew copyModels in grobid-ner did not work I had to copy and paste the files from grobid/grobid-ner/resources/models to grobid/grobid-home/models myself. All works now and in java 11 too.

curtkohler commented 3 years ago

Yes, I also noticed that gradlew copyModels warns that it is using a deprecated call on my platform. It doesn't copy the models even though it reports a successful outcome for the command. Ended up copying the models by hand.

kermitt2 commented 3 years ago

Thanks @aa303554 and @curtkohler !

The copyModels thing is an issue from grobid-ner. The deprecated call is not a problem (at least until gradle 7.0 ;)

copyModels is not working indeed in grobid-ner. I've never been able to understand why and when the most simple gradle copy task with direct task call is executed or not executed, this is totally weird. Anyway I try to rewrite the task and I don't know why it is working now, but copy is done for me after https://github.com/kermitt2/grobid-ner/commit/24de91fa887d3dc3b4a0280aee378e39a1f0fb00

Thanks !