voyanttools / trombone

GNU General Public License v3.0
3 stars 1 forks source link

Trombone

This is an experimental rewrite of Trombone (version 4), the back-end system for Voyant Tools. Please note that everything here is in flux and likely to change. In fact, this entire codebase may be abandoned if significant difficulties are encountered. If you're interested in the current version of Voyant Tools (version 3.x available through [voyant-tools.org](http://voyant-tools.org/], please see [here] (http://dev.hermeneuti.ca/~hermeneutica/voyeur/trac/) instead.

Trombone is a multi-purpose text analysis library. It makes it relatively easy to create corpora from a variety of sources (URLs, files, strings, etc.) and in a variety of formats (XML, HTML, PDF, etc.). Some basic term frequency, distribution, and keyword-in-context functionality is available (intended for use by more sophisticated front-end interfaces like Voyant Tools).

Quick Reference

While awaiting more detailed documentation, here's a quick overview of how to use Trombone in stand-alone mode. The most important thing is to run org.voyanttools.trombone.Controller as a Java Application (in Eclipse, just right-click on the file, choose Run As and select Java Application). This should produce results in the console (not very interesting ones since we haven't provided any parameters). To run more interesting operations, edit the run configuration that was created when you just ran the application. Use quotes around key=value pairs that have spaces. Here are some useful parameters:

Build an executable jar

To build an executable jar run the command :

$ mvn clean compile assembly:single

A .jar named trombone-<version>-jar-with-dependencies.jar should be created. You can now run the jar with the desired tool, for example, with the corpus.DocumentContexts :

$ java -jar \
    ./target/trombone-5.2.1-SNAPSHOT-jar-with-dependencies.jar \
    tool=corpus.DocumentContexts \
    query=cash \
    query=mine \
    file=path/to/file-or-directory \
    outputFile=path/to/output/output.json

License

Trombone is currently under a GPLv3 license. See the license file.