kermitt2 / entity-fishing

A machine learning tool for fishing entities
http://nerd.readthedocs.io/
Apache License 2.0
250 stars 24 forks source link

Why the tokens showing different result while running on localhost in comparision to online ? #56

Closed asdf5252 closed 6 years ago

asdf5252 commented 6 years ago

screenshot from 2018-01-22 12-59-25 screenshot from 2018-01-22 12-59-33

lfoppiano commented 6 years ago

Hi @asdf5252, could you please provide us the two queries?

asdf5252 commented 6 years ago

Hello @lfoppiano , I am not getting which two queries you want .

Regarding this issue , my main concern is that why the tokens shows different output for the same text when I am running on localhost as well as in the nerd demo.

I understand that total number of tokens can differ but the explanation of the word also differs as in the demo part it is showing more while same word in the localhost has less explanation?

lfoppiano commented 6 years ago

Hi @asdf5252 I'd like to have the query to be able to easily copy it :-)

Regarding the services, I recommend you to use http://nerd.huma-num.fr instead, I'm sure there it's the latest stable version of entity-fishing.

I need to have a look at your issue more closely, we have a similar one about reproducibility on issue #51

asdf5252 commented 6 years ago

Hello @lfoppiano I am using disambiguate - text service
and passing the text i.e

{
    "text": "In signal processing, a filter is a device or process that removes some unwanted components or features from a signal. Filtering is a class of signal processing, the defining feature of filters being the complete or partial suppression of some aspect of the signal.",
    "shortText": "",
    "termVector": [],
    "language": {
        "lang": "en"
    },
    "entities": [],
    "onlyNER": false,
    "resultLanguages": [
        "de",
        "fr"
    ],
    "nbest": false,
    "sentence": false,
    "customisation": "generic"
}

The above text shows different output if we compare on localhost as well as in the nerd demo. See the images below for both localhost and nerd demo screenshot from 2018-02-15 16-26-43 screenshot from 2018-02-15 16-27-21

lfoppiano commented 6 years ago

Thanks for the query. We are investigating a similar issue (#51) which should include also the fix to this problem.

asdf5252 commented 6 years ago

Hello @lfoppiano

Issue no 51 has been closed but still my problem exist , i am providing the query

{
    "text": "Empirical mode decomposition (EMD) has recently been pioneered by Huang et al. for adaptively representing nonstationary signals as sums of zero-mean amplitude modulation frequency modulation components. In order to better understand the way EMD behaves in stochastic situations involving broadband noise, we report here on numerical experiments based on fractional Gaussian noise. In such a case, it turns out that EMD acts essentially as a dyadic filter bank resembling those involved in wavelet decompositions. It is also pointed out that the hierarchy of the extracted modes may be similarly exploited for getting access to the Hurst exponent.",
    "shortText": "",
    "termVector": [],
    "language": {
        "lang": "en"
    },
    "entities": [],
    "onlyNER": false,
    "resultLanguages": [
        "de",
        "fr"
    ],
    "nbest": false,
    "sentence": false,
    "customisation": "generic"
}

Find the output of the query from both localhost and entity fishing demo

screenshot from 2018-02-19 18-24-38

screenshot from 2018-02-19 18-24-49

See the difference of the response of the token from both.

kermitt2 commented 6 years ago

Hello @asdf5252 ! You are comparing two different versions of the tool, the online version has not been updated. issue #51 refers to different results from the same version/instance and this problem has been solved. I will update the online version when version 0.0.3 will be release - which should be before the end of this month.

asdf5252 commented 6 years ago

Hello @kermitt2

I am building (N)ERD 0.0.2 , my main issue is that see the description of the token as the term in localhost has very less explanation as compared to the entity fishing demo i.e the term in the demo part have more information about the token.

I have just updated and then building it

/Desktop/rahul/nerd$ git pull
remote: Counting objects: 593, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 593 (delta 239), reused 245 (delta 239), pack-reused 346
Receiving objects: 100% (593/593), 15.38 MiB | 608.00 KiB/s, done.
Resolving deltas: 100% (310/310), completed with 85 local objects.
From https://github.com/kermitt2/nerd
   4080514..1e94a70  master          -> origin/master
   a126b27..213653f  0.0.3           -> origin/0.0.3
 * [new branch]      0.0.3-issue51-reproducibility -> origin/0.0.3-issue51-reproducibility
 * [new branch]      lmdb-embeddings -> origin/lmdb-embeddings
Updating 4080514..1e94a70
Fast-forward
 .../nerd/disambiguation/NerdEntity.java            | 24 ++++++++++------------
 src/main/webapp/nerd/nerd.js                       | 16 ++++++++++-----
 2 files changed, 22 insertions(+), 18 deletions(-)
grid@neo47:~/Desktop/rahul/nerd$ mvn -Dmaven.test=true jetty:run-war
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for com.scienceminer.nerd:nerd-service:war:0.0.2
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 47, column 21
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building (N)ERD 0.0.2
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] >>> jetty-maven-plugin:9.4.6.v20170531:run-war (default-cli) > package @ nerd-service >>>
[WARNING] The POM for org.grobid:grobid-core:jar:0.4.4 is missing, no dependency information available
[WARNING] The POM for org.grobid:grobid-ner:jar:0.4.4 is missing, no dependency information available
[WARNING] The POM for org.grobid:grobid-trainer:jar:0.4.4 is missing, no dependency information available
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ nerd-service ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 5 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ nerd-service ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 141 source files to /home/grid/Desktop/rahul/nerd/target/classes
[INFO] /home/grid/Desktop/rahul/nerd/src/main/java/com/scienceminer/nerd/disambiguation/NerdEngine.java: Some input files use or override a deprecated API.
[INFO] /home/grid/Desktop/rahul/nerd/src/main/java/com/scienceminer/nerd/disambiguation/NerdEngine.java: Recompile with -Xlint:deprecation for details.
[INFO] /home/grid/Desktop/rahul/nerd/src/main/java/com/scienceminer/nerd/kb/model/Redirect.java: Some input files use unchecked or unsafe operations.
[INFO] /home/grid/Desktop/rahul/nerd/src/main/java/com/scienceminer/nerd/kb/model/Redirect.java: Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-jar-plugin:3.0.2:jar (make-a-jar) @ nerd-service ---
[INFO] Building jar: /home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2.jar
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ nerd-service ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 10 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ nerd-service ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.20:test (default-test) @ nerd-service ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.scienceminer.nerd.utilities.mediaWiki.TestMediaWikiParser
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/grid/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/grid/.m2/repository/org/grobid/grobid-core/0.4.4/grobid-core-0.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (com.scienceminer.nerd.utilities.mediaWiki.MediaWikiParser).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.01 s - in com.scienceminer.nerd.utilities.mediaWiki.TestMediaWikiParser
[INFO] Running com.scienceminer.nerd.utilities.NerdPropertyTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.357 s - in com.scienceminer.nerd.utilities.NerdPropertyTest
[INFO] Running com.scienceminer.nerd.service.NerdQueryTest
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.163 s - in com.scienceminer.nerd.service.NerdQueryTest
[INFO] Running com.scienceminer.nerd.disambiguation.TestWeightedTerm
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in com.scienceminer.nerd.disambiguation.TestWeightedTerm
[INFO] Running com.scienceminer.nerd.disambiguation.TestProcessText
>>>>>>>> GROBID_HOME=/home/grid/Desktop/rahul/grobid/grobid-home

init upper level language independent environment
building Environment for upper knowledge base
Environment built - 9155139 concepts.
init Environment for language en
building Environment for language en
isLoaded: true
Environment built - 14651883 pages.
domains en / isLoaded: true
Warning: Orthopedic surgery is not a category found in Wikipedia.
Warning: Environment is not a category found in Wikipedia.
init Environment for language de
building Environment for language de
isLoaded: true
Environment built - 3523959 pages.
init Environment for language fr
building Environment for language fr
isLoaded: true
Environment built - 3631810 pages.
Setophaga ruticilla: 1.0
Setophaga: 0.0015192830367373664
ruticilla: 0.0
bird: 9.534052637295964E-5
washing machine: 0.45897020520637166
washing: 3.942409146377737E-4
machine: 3.896923023222438E-5

Other factors were also at play, said Felix Boni, head of research at James Capel in Mexico City, such as positive technicals and economic uncertainty in Argentina, which has put it and neighbouring Brazil's markets at risk.
Felix Boni  Felix Boni  PERSON  38  48  
James Capel James Capel BUSINESS    70  81  
Mexico  Mexico  LOCATION    85  91  
Argentina   Argentina   LOCATION    154 163 
Brazil  Brazil  LOCATION    199 205 
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.516 s - in com.scienceminer.nerd.disambiguation.TestProcessText
[INFO] Running com.scienceminer.nerd.disambiguation.SentenceTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.scienceminer.nerd.disambiguation.SentenceTest
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 32, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- maven-war-plugin:3.1.0:war (default-war) @ nerd-service ---
[INFO] Packaging webapp
[INFO] Assembling webapp [nerd-service] in [/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2]
[INFO] Processing war project
[INFO] Copying webapp webResources [/home/grid/Desktop/rahul/nerd/src/main/webapp/WEB-INF] to [/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp webResources [/home/grid/Desktop/rahul/nerd/doc] to [/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp webResources [/home/grid/Desktop/rahul/nerd/lib] to [/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp resources [/home/grid/Desktop/rahul/nerd/src/main/webapp]
[INFO] Webapp assembled in [5134 msecs]
[INFO] Building war: /home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2.war
[INFO] 
[INFO] <<< jetty-maven-plugin:9.4.6.v20170531:run-war (default-cli) < package @ nerd-service <<<
[INFO] 
[INFO] 
[INFO] --- jetty-maven-plugin:9.4.6.v20170531:run-war (default-cli) @ nerd-service ---
[INFO] Configuring Jetty for project: (N)ERD
[INFO] Logging initialized @46066ms to org.eclipse.jetty.util.log.Slf4jLog
[INFO] Context path = /
[INFO] Tmp directory = /home/grid/Desktop/rahul/nerd/target/tmp
[INFO] Web defaults = org/eclipse/jetty/webapp/webdefault.xml
[INFO] Web overrides =  none
[INFO] jetty-9.4.6.v20170531
[INFO] Scanning elapsed time=4397ms
[INFO] DefaultSessionIdManager workerName=node0
[INFO] No SessionScavenger set, using defaults
[INFO] Scavenging every 660000ms
Feb 19, 2018 7:16:02 PM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
  com.scienceminer.nerd.service
Feb 19, 2018 7:16:02 PM com.sun.jersey.api.core.ScanningResourceConfig logClasses
INFO: Root resource classes found:
  class com.scienceminer.nerd.service.NerdRestService
Feb 19, 2018 7:16:02 PM com.sun.jersey.api.core.ScanningResourceConfig init
INFO: No provider classes found.
Feb 19, 2018 7:16:02 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2/WEB-INF/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2/WEB-INF/lib/grobid-core-0.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19 Feb 2018 19:16.03 [INFO ] NerdRestService           - Init Servlet NerdRestService.
19 Feb 2018 19:16.03 [DEBUG] NerdServiceProperties     - Start NerdServiceProperties.getNewInstance
19 Feb 2018 19:16.03 [DEBUG] NerdServiceProperties     - Instantiating NerdServiceProperties
19 Feb 2018 19:16.03 [DEBUG] NerdServiceProperties     - Initiating property loading
19 Feb 2018 19:16.03 [WARN ] NerdServiceProperties     - Cannot load com.scienceminer.nerd.property.service, trying in other ways. 
19 Feb 2018 19:16.03 [DEBUG] NerdProperties            - synchronized getNewInstance
19 Feb 2018 19:16.03 [DEBUG] NerdProperties            - Initiating property loading
19 Feb 2018 19:16.03 [DEBUG] NerdProperties            - loading Nerd.properties
19 Feb 2018 19:16.03 [ERROR] NerdProperties            - Cannot set Nerd properties path from the context.
19 Feb 2018 19:16.03 [DEBUG] NerdProperties            - Checking Properties
19 Feb 2018 19:16.03 [INFO ] NerdRestService           - Init of Servlet NerdRestService finished.
19 Feb 2018 19:16.03 [INFO ] NerdRestService           - Init lexicon.
19 Feb 2018 19:16.03 [DEBUG] Lexicon                   - Get new instance of Lexicon
19 Feb 2018 19:16.03 [INFO ] Lexicon                   - Initiating dictionaries
19 Feb 2018 19:16.03 [INFO ] Lexicon                   - End of Initialization of dictionaries
19 Feb 2018 19:16.03 [INFO ] NerdRestService           - Init lexicon finished.
19 Feb 2018 19:16.03 [INFO ] NerdRestService           - Init KB resources.
19 Feb 2018 19:16.03 [DEBUG] UpperKnowledgeBase        - Get new instance of UpperKnowledgeBase
19 Feb 2018 19:16.03 [INFO ] UpperKnowledgeBase        - Init Lexicon
19 Feb 2018 19:16.03 [INFO ] UpperKnowledgeBase        - Lexicon initialized
19 Feb 2018 19:16.03 [INFO ] UpperKnowledgeBase        - 
Init Upper Knowledge base layer

init upper level language independent environment
building Environment for upper knowledge base
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/wikidataIds.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/latest-all.json.bz2 is not readable
Environment built - 9155139 concepts.
19 Feb 2018 19:16.03 [INFO ] UpperKnowledgeBase        - Init English lower Knowledge base layer
init Environment for language en
building Environment for language en
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/stats.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/page.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/label.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/pageLabel.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/pageLinkIn.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/pageLinkOut.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/categoryParents.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/articleParents.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/childCategories.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/childArticles.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/redirectTargetsBySource.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/redirectSourcesByTarget.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/wikidata.txt is not readable
19 Feb 2018 19:16.03 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/en/translations.csv is not readable
19 Feb 2018 19:16.03 [INFO ] KBLowerEnvironment        - Could not locate markup file in /mnt/data/wikipedia/latest/en
isLoaded: true
Environment built - 14651883 pages.
domains en / isLoaded: true
Warning: Orthopedic surgery is not a category found in Wikipedia.
Warning: Environment is not a category found in Wikipedia.
19 Feb 2018 19:16.04 [INFO ] UpperKnowledgeBase        - Init German lower Knowledge base layer
init Environment for language de
building Environment for language de
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/stats.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/page.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/label.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/pageLabel.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/pageLinkIn.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/pageLinkOut.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/categoryParents.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/articleParents.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/childCategories.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/childArticles.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/redirectTargetsBySource.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/redirectSourcesByTarget.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/wikidata.txt is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/de/translations.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBLowerEnvironment        - Could not locate markup file in /mnt/data/wikipedia/latest/de
isLoaded: true
Environment built - 3523959 pages.
19 Feb 2018 19:16.04 [INFO ] UpperKnowledgeBase        - Init French lower Knowledge base layer
init Environment for language fr
building Environment for language fr
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/stats.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/page.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/label.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/pageLabel.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/pageLinkIn.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/pageLinkOut.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/categoryParents.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/articleParents.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/childCategories.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/childArticles.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/redirectTargetsBySource.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/redirectSourcesByTarget.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/wikidata.txt is not readable
19 Feb 2018 19:16.04 [INFO ] KBEnvironment             - /mnt/data/wikipedia/latest/fr/translations.csv is not readable
19 Feb 2018 19:16.04 [INFO ] KBLowerEnvironment        - Could not locate markup file in /mnt/data/wikipedia/latest/fr
isLoaded: true
Environment built - 3631810 pages.
19 Feb 2018 19:16.04 [INFO ] UpperKnowledgeBase        - End of Initialization of Wikipedia environments
19 Feb 2018 19:16.04 [INFO ] UpperKnowledgeBase        - Init Grobid
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - synchronized getNewInstance
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - Initiating property loading
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - loading GROBID_HOME path
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - loading grobid.properties
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - Checking Properties
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - loading pdf2xml path
19 Feb 2018 19:16.04 [DEBUG] GrobidProperties          - pdf2xml home directory set to /home/grid/Desktop/rahul/grobid/grobid-home/pdf2xml/lin-64
19 Feb 2018 19:16.04 [INFO ] LibraryLoader             - Loading external native CRF library
19 Feb 2018 19:16.04 [DEBUG] LibraryLoader             - The property org.grobid.home already exists. No mocking of context made.
19 Feb 2018 19:16.04 [DEBUG] LibraryLoader             - /home/grid/Desktop/rahul/grobid/grobid-home/lib/lin-64
19 Feb 2018 19:16.04 [INFO ] LibraryLoader             - Loading Wapiti native library...
19 Feb 2018 19:16.04 [INFO ] LibraryLoader             - Library crfpp loaded
>>>>>>>> GROBID_HOME=/home/grid/Desktop/rahul/grobid/grobid-home
19 Feb 2018 19:16.04 [INFO ] NerdRestService           - Init KB resources finished.
[INFO] Started o.e.j.m.p.JettyWebAppContext@750c9e57{/,file:///home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2/,AVAILABLE}{/home/grid/Desktop/rahul/nerd/target/nerd-service-0.0.2.war}
[INFO] Started ServerConnector@23f1857f{HTTP/1.1,[http/1.1]}{0.0.0.0:8090}
[INFO] Started @54083ms
[INFO] Started Jetty Server
kermitt2 commented 6 years ago

Do you mean all the Wikidata/wikipedia information associated to a disambiguated term? To have the same as the online version with respect to this, you need to use the development version, branch 0.0.3 which is documented here -> http://nerd.readthedocs.io/en/0.0.3/ - see section build. This 0.0.3 version will be merged with master normally later this month.

asdf5252 commented 6 years ago

Hello @kermitt2

Yes , my main concern is about the term explanation on localhost compared to the online version.

asdf5252 commented 6 years ago

Hello @kermitt2

RE :(To have the same as the online version with respect to this, you need to use the development version, branch 0.0.3 which is documented here -> http://nerd.readthedocs.io/en/0.0.3/ - see section build.)

This path is not seen , i.e The path to grobid-home shall indicated in the file data/config/mention.yaml So , I have defined the path to grobid-home in the file src/main/resource/nerd.properties.

Unzip the 6 first db archives files under data/db/ is also not present so I have unzip it in the data/wikipedia/

After that , building process by mvn clean install But it is still building (N)ERD 0.0.2

I am not getting how to build (N)ERD 0.0.3 on the localhost ? For 0.0.3 , Is it necessary to have GROBID development version 0.6.0-SNAPSHOT ?

See the Build process

~/Desktop/0.0.3/nerd$ mvn clean install
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for com.scienceminer.nerd:nerd-service:war:0.0.2
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 47, column 21
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building (N)ERD 0.0.2
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for org.grobid:grobid-core:jar:0.4.4 is missing, no dependency information available
[WARNING] The POM for org.grobid:grobid-ner:jar:0.4.4 is missing, no dependency information available
[WARNING] The POM for org.grobid:grobid-trainer:jar:0.4.4 is missing, no dependency information available
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ nerd-service ---
[INFO] Deleting /home/grid/Desktop/0.0.3/nerd/target
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ nerd-service ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 5 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ nerd-service ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 141 source files to /home/grid/Desktop/0.0.3/nerd/target/classes
[INFO] /home/grid/Desktop/0.0.3/nerd/src/main/java/com/scienceminer/nerd/evaluation/EvaluationDataGeneration.java: Some input files use or override a deprecated API.
[INFO] /home/grid/Desktop/0.0.3/nerd/src/main/java/com/scienceminer/nerd/evaluation/EvaluationDataGeneration.java: Recompile with -Xlint:deprecation for details.
[INFO] /home/grid/Desktop/0.0.3/nerd/src/main/java/com/scienceminer/nerd/kb/model/Redirect.java: Some input files use unchecked or unsafe operations.
[INFO] /home/grid/Desktop/0.0.3/nerd/src/main/java/com/scienceminer/nerd/kb/model/Redirect.java: Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-jar-plugin:3.0.2:jar (make-a-jar) @ nerd-service ---
[INFO] Building jar: /home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2.jar
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ nerd-service ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 10 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.1:testCompile (default-testCompile) @ nerd-service ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 9 source files to /home/grid/Desktop/0.0.3/nerd/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.20:test (default-test) @ nerd-service ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.scienceminer.nerd.utilities.mediaWiki.TestMediaWikiParser
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/grid/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/grid/.m2/repository/org/grobid/grobid-core/0.4.4/grobid-core-0.4.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (com.scienceminer.nerd.utilities.mediaWiki.MediaWikiParser).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.117 s - in com.scienceminer.nerd.utilities.mediaWiki.TestMediaWikiParser
[INFO] Running com.scienceminer.nerd.utilities.NerdPropertyTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.164 s - in com.scienceminer.nerd.utilities.NerdPropertyTest
[INFO] Running com.scienceminer.nerd.service.NerdQueryTest
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.184 s - in com.scienceminer.nerd.service.NerdQueryTest
[INFO] Running com.scienceminer.nerd.disambiguation.TestWeightedTerm
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.02 s - in com.scienceminer.nerd.disambiguation.TestWeightedTerm
[INFO] Running com.scienceminer.nerd.disambiguation.TestProcessText
>>>>>>>> GROBID_HOME=/home/grid/Desktop/rahul/grobid/grobid-home

init upper level language independent environment
building Environment for upper knowledge base
Environment built - 37413613 concepts.
init Environment for language en
building Environment for language en
isLoaded: true
Environment built - 14899737 pages.
domains en / isLoaded: true
Warning: Orthopedic surgery is not a category found in Wikipedia.
Warning: Environment is not a category found in Wikipedia.
init Environment for language de
building Environment for language de
isLoaded: true
Environment built - 3579552 pages.
init Environment for language fr
building Environment for language fr
isLoaded: true
Environment built - 3681264 pages.
Setophaga ruticilla: 1.0
Setophaga: 0.0013372708853410344
ruticilla: 0.0
bird: 7.054902824383859E-5
washing machine: 0.5362610476802396
washing: 3.910180049453826E-4
machine: 3.795203883595944E-5

Other factors were also at play, said Felix Boni, head of research at James Capel in Mexico City, such as positive technicals and economic uncertainty in Argentina, which has put it and neighbouring Brazil's markets at risk.
Felix Boni  Felix Boni  PERSON  38  48  
James Capel James Capel BUSINESS    70  81  
Mexico  Mexico  LOCATION    85  91  
Argentina   Argentina   LOCATION    154 163 
Brazil  Brazil  LOCATION    199 205 
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.96 s - in com.scienceminer.nerd.disambiguation.TestProcessText
[INFO] Running com.scienceminer.nerd.disambiguation.SentenceTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in com.scienceminer.nerd.disambiguation.SentenceTest
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 32, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- maven-war-plugin:3.1.0:war (default-war) @ nerd-service ---
[INFO] Packaging webapp
[INFO] Assembling webapp [nerd-service] in [/home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2]
[INFO] Processing war project
[INFO] Copying webapp webResources [/home/grid/Desktop/0.0.3/nerd/src/main/webapp/WEB-INF] to [/home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp webResources [/home/grid/Desktop/0.0.3/nerd/doc] to [/home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp webResources [/home/grid/Desktop/0.0.3/nerd/lib] to [/home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2]
[INFO] Copying webapp resources [/home/grid/Desktop/0.0.3/nerd/src/main/webapp]
[INFO] Webapp assembled in [3731 msecs]
[INFO] Building war: /home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2.war
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ nerd-service ---
[INFO] Installing /home/grid/Desktop/0.0.3/nerd/target/nerd-service-0.0.2.war to /home/grid/.m2/repository/com/scienceminer/nerd/nerd-service/0.0.2/nerd-service-0.0.2.war
[INFO] Installing /home/grid/Desktop/0.0.3/nerd/pom.xml to /home/grid/.m2/repository/com/scienceminer/nerd/nerd-service/0.0.2/nerd-service-0.0.2.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.562 s
[INFO] Finished at: 2018-02-22T17:53:27+05:30
[INFO] Final Memory: 33M/548M
[INFO] ------------------------------------------------------------------------
asdf5252 commented 6 years ago

Hello @kermitt2

Any progress regarding this issue?

asdf5252 commented 6 years ago

Hello @lfoppiano @kermitt2

RE :(To have the same as the online version with respect to this, you need to use the development version, branch 0.0.3 which is documented here -> http://nerd.readthedocs.io/en/0.0.3/ - see section build.)

This works perfectly , as the new wikidata have no issue regarding the different result of localhost as compared to online version. Thankyou