Open uxian opened 11 years ago
Thanks for the note. Are dateutil and httplib SNSAPI dependencies? In my impression, it seems SNSRouter does not need them. (maybe bottle need them?)
As to Python version, I'm not sure what is the lowest to make it work. When SNSAPI is initiated, you said 2.7 is the requirement?
I'll make pymmseg-cpp and SogouW a "soft" dependency later. They are just feature extraction dependencies and teh RPR-SGD should be able to run without them. User can plugin their own features. The architecture is not good at the moment... I was running too fast in the prototyping in order to catch the deadline... Later one should be able to enable or disable them. Is python-dev the dependency you need to install pymmseg-cpp? If so, it should also be phased out later.
SNSAPI need dateutil in utils.py, httplib2...I don't really remember now.
Most of linux in our lab pre-installed python 2.6, they can run SNSAPI well. So version should be no problem :D
Yes, python-devel is just for installing pymmseg-cpp, if on Windows, VS2008 or MingW is required to compile C code.
I see the situation. :+1:
pymmseg-cpp
is really a poor choice... Algorithm wise, it is far from the state-of-the-art... Just for prototyping. I think there should be substitutions which can be installed by pip
or easy_install
.
Oh yes, pip and easy_install worth trying
I have a sick needs for your ranking algorithm, I'm willing to mark every statues, in order to filter annoying messages. I think this is the greatest dependency for sns-router ! also makes sns-router a killer !!!
In the first month, I marked nearly every message I see as "seen" and try to tag them carefully... Sometimes, I found myself inconsistent (which harms the ranking algorithm)... Luckily, the result turns out to be not bad. The ranked version is much cleaner. I have checked the ratio:
# of messages tagged as (mark, gold, silver, bronze, interesting, news) / # of messages marked as "seen"
After deploying the algorithm, the ratio becomes much higher.
As to word segmentation, my initial take is this one: http://ansj.sdapp.cn/
However, I'm not familiar with Java. It will also be a considerable obstacle for other users. After we make SNSRouter components RESTful, we can deploy a dedicated word segmentation server. Then everyone can use it without pain...
So, the way you train computer program, also train yourself, about how to evaluate a message :smiley:
word segmentation server? will it be too slow to use, due to network latency? It is a good plan though.
Yes, network latency is a problem if we want to go large scale. The target use of SNSAPI and most of its descendants are supposed to be small scale. The new incoming messages per hour will be normally on the order of several hundreds. The segmentation server (or other rich feature extraction services) can be run both locally or on another individual server. When run locally, this interface will introduce some overhead (compared to use Feature extraction class directly). Anyway, it should be configurable.
@uxian
When I'm writing the report, I recall NetworkX is also a dependency, which is used to perform graph induction part. Since I already installed it, I used it directly. The underlying algorithm is Floyd, which can be implemented in 10 lines. Later I should remove this dependency....
@uxian , Just found better way to organize dependency:
Write requirements.txt
and others can install dependency by a single command:
[sudo] pip install -r requirements.txt
For example, please see https://github.com/wong2/xiaohuangji
I learned a lot from this project. It's worth the time watching how they get things done.
Except python2.0 ( Mine is 2.6.6, is this the oldest version to run sns-router? ), we also need:
If you want to set "ranking"(in queue.py) to "yes", to use machine learn ranking algorithm -- PRP
to be continued