ziqizhang / jate

NEWS: JATE2.0 Beta.11 Released, see details below.
GNU Lesser General Public License v3.0
81 stars 29 forks source link

API? #34

Closed paris0120 closed 7 years ago

paris0120 commented 7 years ago

I'm wondering if there is any way that I can use it as a library in my application? Could you provide some basic example codes in Wiki? I just want to use the algorithmics.

Thank you.

jerrygaoLondon commented 7 years ago

Sorry of lacking sufficient documentation. JATE2 can be used as a library without much effort.

As mentioned in Quick Start, You can either 1) download jar from maven repository or add following configuration in your maven project along with Dragontools.

<dependency>
    <groupId>uk.ac.shef.dcs</groupId>
    <artifactId>jate</artifactId>
    <version>2.0-beta.1</version>
</dependency>

Once you have setup JATE2 libraries, you are able to use all the available ATE algorithms in your application/project. Our App shows the example how to use and integrate ATE algorithms with Apache Solr. All the available ATE implementations are subclass of uk.ac.shef.dcs.jate.algorithm.Algorithm in the package of ```uk.ac.shef.dcs.jate.algorithm.. Current method/interface should be fairly straightforward to use by simply providing a list of candidate terms and corresponding features. The method will then return ranked terms modelled byuk.ac.shef.dcs.jate.model.JATETerm``` with scores and other features/metadata. Since JATE2 relies on Solr to perform pre-processing and feature extraction, you have to implement your own method or use Solr or our embedded Solr implementation (i.e., App* ) to parse and extract candidates and features from your corpus.

We will introduce more documentations in near future.

Thanks for your interests.

paris0120 commented 7 years ago

I tried

AppCValue.main(("uk.ac.shef.dcs.jate.app.AppCValue -corpusDir " + corpusDir + " -o cvalue-terms.json " + solrDir + "/testdata/solr-testbed ACLRDTEC").split(" "));

but uk.ac.shef.dcs.jate.JATEException: Cannot find expected field: jate_ngraminfo at uk.ac.shef.dcs.jate.util.SolrUtil.getTermVector(SolrUtil.java:36) at uk.ac.shef.dcs.jate.feature.FrequencyTermBasedFBMaster.build(FrequencyTermBasedFBMaster.java:39) at com.scholarfriend.maven.Epollo.Tools.AppCValue.extract(AppCValue.java:93) at com.scholarfriend.maven.Epollo.Tools.AppCValue.extract(AppCValue.java:85) at uk.ac.shef.dcs.jate.app.App.extract(App.java:285)

I have pdf, txt, and html file under the folder.

paris0120 commented 7 years ago

Logger: com.softcorporation.util.Logger Mon Feb 27 01:33:46 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:46 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:47 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:47 EST 2017 loading done Mon Feb 27 01:33:47 EST 2017 loading done Mon Feb 27 01:33:47 EST 2017 loading done Mon Feb 27 01:33:47 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:48 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:48 EST 2017 loading exception data for lemmatiser... Mon Feb 27 01:33:48 EST 2017 loading done Mon Feb 27 01:33:48 EST 2017 loading done 2017-02-27 01:33:48 ERROR SolrCore:525 - [jateCore] Solr index directory 'A:\eclipse\lib\jate-master\testdata\solr-testbed\jateCore\data\index/' is locked. Throwing exception. 2017-02-27 01:33:48 ERROR CoreContainer:740 - Error creating core [jateCore]: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! org.apache.solr.common.SolrException: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:659) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438) at java.util.concurrent.FutureTask.run(Unknown Source) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'jateCore'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528) at org.apache.solr.core.SolrCore.(SolrCore.java:761) ... 9 more Mon Feb 27 01:33:48 EST 2017 loading done 2017-02-27 01:33:48 ERROR SolrCore:525 - [GENIA] Solr index directory 'A:\eclipse\lib\jate-master\testdata\solr-testbed\GENIA\data\index/' is locked. Throwing exception. 2017-02-27 01:33:48 ERROR CoreContainer:740 - Error creating core [GENIA]: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! org.apache.solr.common.SolrException: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! at org.apache.solr.core.SolrCore.(SolrCore.java:820) at org.apache.solr.core.SolrCore.(SolrCore.java:659) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:727) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:447) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:438) at java.util.concurrent.FutureTask.run(Unknown Source) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'GENIA'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually! at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528) at org.apache.solr.core.SolrCore.(SolrCore.java:761) ... 9 more 2017-02-27 01:33:48 INFO AppCValue:72 - Start CValue term ranking and filtering for whole index ... uk.ac.shef.dcs.jate.JATEException: Cannot find expected field: jate_ngraminfo at uk.ac.shef.dcs.jate.util.SolrUtil.getTermVector(SolrUtil.java:36) at uk.ac.shef.dcs.jate.feature.FrequencyTermBasedFBMaster.build(FrequencyTermBasedFBMaster.java:39) at uk.ac.shef.dcs.jate.app.AppCValue.extract(AppCValue.java:86) at uk.ac.shef.dcs.jate.app.AppCValue.extract(AppCValue.java:77) at uk.ac.shef.dcs.jate.app.App.extract(App.java:285) at uk.ac.shef.dcs.jate.app.AppCValue.main(AppCValue.java:48)

I removed all the file in the data folder but still got these messages.

jerrygaoLondon commented 7 years ago

To run AppCValue programmatically, the main method accepts run-time parameters from the string array with the same order as the command line format.

The problem of your implements is that you should not provide class name as parameter if you directly run AppCValue programmatically.

So try with the following:

AppCValue.main(("-corpusDir " + corpusDir + " -o cvalue-terms.json " + solrDir + "/testdata/solr-testbed ACLRDTEC").split(" "));

To make it more clearly, you can try with the following code:

String[] cvalueArgs = new String[6];
cvalueArgs[0] = "-corpusDir";
cvalueArgs[1] = <YOUR_CORPUS_DIR>;
cvalueArgs[2] = "-o";
cvalueArgs[3] = <YOUR_JSON_FILE_PATH>;
cvalueArgs[4] = <YOUR_SOLR_HOME_PATH>;
cvalueArgs[5] = <YOUR_SOLR_CORE_NAME>;

AppCValue.main(cvalueArgs);

Hope it helps.

paris0120 commented 7 years ago

Thank you it works.