Closed gacou54 closed 2 years ago
corpus.DocumentTerms is the tool to get the terms per document. corpus.CorpusTerms is the tool to get the terms of the entire corpus.
corpus.DocumentTerms
corpus.CorpusTerms
Here is an example of how to use corpus.DocumentTerms by running the jar:
java -jar \ ./target/trombone-5.2.1-SNAPSHOT-jar-with-dependencies.jar \ storage=file \ dataDirectory=./data/data_directory/ \ tool=corpus.DocumentTerms \ minRawFreq=100 \ whiteList=de,the \ file=./data/raw \ outputFile=./data/results/output.json
Here is an example of how to use corpus.CorpusTerms by running the jar:
java -jar \ ./target/trombone-5.2.1-SNAPSHOT-jar-with-dependencies.jar \ storage=file \ dataDirectory=./data/data_directory/ \ tool=corpus.CorpusTerms \ minRawFreq=100 \ whiteList=de,the \ file=./data/raw \ outputFile=./data/results/output.json
corpus.DocumentTerms
is the tool to get the terms per document.corpus.CorpusTerms
is the tool to get the terms of the entire corpus.Here is an example of how to use
corpus.DocumentTerms
by running the jar:Here is an example of how to use
corpus.CorpusTerms
by running the jar: