To start the server a twitter properties file with the
credentials for the Twitter API must be provided in the
resources directory (src/main/resources/twitter.properties
):
consumerKey=YOUR_CONSUMER_KEY
consumerSecret=YOUR_CONSUMER_SECRET
accessToken=YOUR_ACCESS_TOKEN
accessTokenSecret=YOUR_ACCESS_TOKEN_SECRET
For compiling the sources and starting the Tomcat server simply type the command
gradlew bootRun
on your shell. You'll see some log messages during startup,
you know that the server is ready when something like this shows up:
2014-10-17 17:37:25.509 INFO 1151 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080/http
2014-10-17 17:37:25.519 INFO 1151 --- [ main] com.aic.Application : Started Application in 4.915 seconds (JVM running for 5.496)
When everything went ok, you can reach the server at http://localhost:8080
.
Currently the server offers two REST endpoints, one for registering a new
company and one for querying the sentiments for a company during a specific
time period. The registration endpoint can be reached at http://localhost:8080/register
and expects name
as POST parameter. The sentiment for
a company can be retrieved by contacting the endpoint http://localhost:8080/sentiment
and takes the four GET parameters name
, classifier
, startDate
, endDate
. The latter
two parameters represent dates and therefore have to conform to the pattern yyyy-MM-dd
.
The parameter classifier
can be set by one of the followed values: SVM_C_SVC
,
SVM_NU_SVC
, SMO
, NAIVE_BAYES
, BAYES_NET
or KNN
. An example query may look like this:
http://localhost:8080/sentiment?name=Microsoft&classifier=KNN&start=2014-09-08&end=2014-09-12
.
It is also possible to use the REST service via a AngularJS Frontend which can be reached at http://localhost:8080/
.
Programming Language: Java (Tomcat Container)
Framework: Spring
Library: Twitter4j
Architecture: Client-Server
For preprocessing purposes the Stanford NLP implementation is used. The part-of-speech tagger makes use of the Penn Treebank, the corresponding tagset can be found here. We used the GATE Twitter part-of-speech tagging model for extending the tagging informations especially for twitter data (HT (hashtags), UH ("lol", "ikr"), USR (username mentions), RT (retweet signifiers), URL (URLs)). The Penn Treebank Tagset can be found here.
For the classification part we used the data mining framework WEKA and LibSVM as a support vector machine.
Twitter API: Florian Taus
Preprocessing: Dominik Pichler, Christian Hotz-Behofsits
Classifier: Matthias Reisinger, Thomas Schmidleithner