hupili / sns-router

Intelligent Cross Domain SNS Router
14 stars 9 forks source link

Dependencies #6

Open uxian opened 11 years ago

uxian commented 11 years ago

Except python2.0 ( Mine is 2.6.6, is this the oldest version to run sns-router? ), we also need:

If you want to set "ranking"(in queue.py) to "yes", to use machine learn ranking algorithm -- PRP

to be continued

hupili commented 11 years ago

Thanks for the note. Are dateutil and httplib SNSAPI dependencies? In my impression, it seems SNSRouter does not need them. (maybe bottle need them?)

As to Python version, I'm not sure what is the lowest to make it work. When SNSAPI is initiated, you said 2.7 is the requirement?

I'll make pymmseg-cpp and SogouW a "soft" dependency later. They are just feature extraction dependencies and teh RPR-SGD should be able to run without them. User can plugin their own features. The architecture is not good at the moment... I was running too fast in the prototyping in order to catch the deadline... Later one should be able to enable or disable them. Is python-dev the dependency you need to install pymmseg-cpp? If so, it should also be phased out later.

uxian commented 11 years ago

SNSAPI need dateutil in utils.py, httplib2...I don't really remember now.

Most of linux in our lab pre-installed python 2.6, they can run SNSAPI well. So version should be no problem :D

Yes, python-devel is just for installing pymmseg-cpp, if on Windows, VS2008 or MingW is required to compile C code.

hupili commented 11 years ago

I see the situation. :+1:

pymmseg-cpp is really a poor choice... Algorithm wise, it is far from the state-of-the-art... Just for prototyping. I think there should be substitutions which can be installed by pip or easy_install.

uxian commented 11 years ago

Oh yes, pip and easy_install worth trying

uxian commented 11 years ago

I have a sick needs for your ranking algorithm, I'm willing to mark every statues, in order to filter annoying messages. I think this is the greatest dependency for sns-router ! also makes sns-router a killer !!!

hupili commented 11 years ago

In the first month, I marked nearly every message I see as "seen" and try to tag them carefully... Sometimes, I found myself inconsistent (which harms the ranking algorithm)... Luckily, the result turns out to be not bad. The ranked version is much cleaner. I have checked the ratio:

# of messages tagged as (mark, gold, silver, bronze, interesting, news) / # of messages marked as "seen" 

After deploying the algorithm, the ratio becomes much higher.

As to word segmentation, my initial take is this one: http://ansj.sdapp.cn/

However, I'm not familiar with Java. It will also be a considerable obstacle for other users. After we make SNSRouter components RESTful, we can deploy a dedicated word segmentation server. Then everyone can use it without pain...

uxian commented 11 years ago

So, the way you train computer program, also train yourself, about how to evaluate a message :smiley:

word segmentation server? will it be too slow to use, due to network latency? It is a good plan though.

hupili commented 11 years ago

Yes, network latency is a problem if we want to go large scale. The target use of SNSAPI and most of its descendants are supposed to be small scale. The new incoming messages per hour will be normally on the order of several hundreds. The segmentation server (or other rich feature extraction services) can be run both locally or on another individual server. When run locally, this interface will introduce some overhead (compared to use Feature extraction class directly). Anyway, it should be configurable.

hupili commented 11 years ago

@uxian

When I'm writing the report, I recall NetworkX is also a dependency, which is used to perform graph induction part. Since I already installed it, I used it directly. The underlying algorithm is Floyd, which can be implemented in 10 lines. Later I should remove this dependency....

hupili commented 11 years ago

@uxian , Just found better way to organize dependency:

Write requirements.txt and others can install dependency by a single command:

[sudo] pip install -r requirements.txt

For example, please see https://github.com/wong2/xiaohuangji

I learned a lot from this project. It's worth the time watching how they get things done.