[Procfile for heroku]
If you don't have node.js running on your machine go to : http://nodejs.org/ download and install the current version
-> Project running on version 2.4.8
.
If you don't have MongoDB running on your machine go to : http://www.mongodb.org/ download and install the current version -> Project running on version v0.10.24
.
If you don't have ElasticSearch running on your machine go to : http://www.elasticsearch.org/ download and install the current version
-> Project running on version version[0.90.11]
.
You should index some documents (tweets) into an ElasticSearch index with the bulk API
INDEX NAME HAS TO BE : tweets
INDEX TYPE HAS TO BE : tweet
Command to run :
curl -s -XPOST localhost:9200/_bulk --data-binary @myfilename; echo
where myfilename
has to be structured like that :
{"index": {"_index": "tweets", "_type": "tweet", "_id": "468510797367615488" }}
{ yourJsonTweet }
{"index": {"_index": "tweets", "_type": "tweet", "_id": "468509383576801280"}}
{ yourJsonTweet }
etc...
Documentation of ElasticSearch
on the Bulk API
: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Node.js & python scripts run on two different processes. They can communicate and send data to each other thanks to ZeroRPC library : http://zerorpc.dotcloud.com/ Install ZeroRPC libraries:
npm install zerorpc
pip install zerorpc
If you have errors:
It might be because zeroMQ
and libtool
libraries are not installed on your machine.
I succeeded in installing zeroRPC library after executing this command :
sudo ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-e
The system is composed of :
MongoDB
database composed of two collections : users and scoredTweets. Users & labeled tweets are sorted in those collections.ElasticSearch
index called tweets
composed of documents of type tweet
that makes the tweets searchable.Python
script where a Logistical Regression classifier
predicts the relevance of unlabeled tweets, sort all the tweets by relevance and sends back the top 20 tweets to the Node.js server
Node.js
application that handles different types of requests performed by the user (ex: /search
, /train
)/search
request performed by the user:/affectscore
request performed by the user when clicking on YES or NO buttons:/train
request performed when the user clicks on the TrainClassifier
button:elasticsearch-0.90.11/bin/elasticsearch -f
mongod
python topNtweets.py
npm install -d
node app.js