hupili / sns-router

Intelligent Cross Domain SNS Router
14 stars 9 forks source link

pymmseg not working #5

Open uxian opened 11 years ago

uxian commented 11 years ago

pymmseg is not working on my Windows.

The story start from here:

1.'ranking' is default to 'no' in queue.py, then I cannot open config page, and I got this

  File "C:\Users\lijunbo\Documents\GitHub\sns-router\views\config.tpl", line 74, in <module>
    %for (f, w) in q.score.feature_weight.iteritems():
AttributeError: 'NoneType' object has no attribute 'feature_weight'

2.So I check queue.py, find that if 'ranking' is not yes, q.score will be None

        if 'ranking' in self.queue_conf and self.queue_conf['ranking'] == "yes":
            from analysis import score
            from analysis import feature
            self.score = score.Score()
            self.Feature = feature.Feature
        else:
            self.score = None
            self._weight_feature = lambda m: 0

3.Then, I set 'ranking' to 'yes' in queue.py, but when I run srfe.py, error happened

    from wordseg import wordseg_clean
  File "C:\Users\lijunbo\Documents\GitHub\sns-router\analysis\wordseg.py", line
4, in <module>
    import mmseg
ImportError: No module named mmseg

4.Then, I try to put following line to srfe.py

sys.path.append('pymmseg-cpp')

5.I got this:

  File "pymmseg-cpp\mmseg\__init__.py", line 2, in <module>
    from _mmseg import Dictionary as _Dictionary, Token, Algorithm
ImportError: No module named _mmseg

6.So, it turns out that pymmseg is not working on my computer. I tried to install pymmseg, but its setup.py did not work.

T T

Can you find a way to solve this? @hupili

hupili commented 11 years ago

Er.. I'll sort out the logic part. When I default ranking to "no", I did not fully check whether all components of SRFE can work. When things are right, it should not require pymmseg (or anything related to Feature extraction) to run SRFE.

For your quick fix, can you try using his setup.py build, then the _mmseg which is compiled as a dynamic library will be in _build folder. In Linux, a "*.so" will be created. I don't know how about windows. In my environment, I used his setup.py install --user to install the module...

hupili commented 11 years ago

Can you try the latest snsrouter (with submodule "snsapi" updated). I fixed the two config sections:

I tested it on a fresher env. Now the FE without ranking should be OK....

@uxian

uxian commented 11 years ago

report a tiny bug, in Windows, the following third method is not work, but the second one (without sharp) works good.

in snsapi/utils.py #line 135

def utc2str(u):
    #return str(datetime.datetime.fromtimestamp(u))
    return _format_date(datetime.datetime.utcfromtimestamp(u))
    #return _format_date(datetime.datetime.fromtimestamp(u, tz.tzlocal()))

Following lines are trackback. you catched this Exception before, I just commented out the "try/catch" to see why I got 0 new messages when I open 127.0.0.1/home_timeline.

  File "snsapi\snsapi\utils.py", line 138, in utc2str
    return _format_date(datetime.datetime.fromtimestamp(u, tz.tzlocal()))
  File "C:\Python27\lib\site-packages\dateutil\tz.py", line 92, in utcoffset
    if self._isdst(dt):
  File "C:\Python27\lib\site-packages\dateutil\tz.py", line 135, in _isdst
    return time.localtime(timestamp+time.timezone).tm_isdst
ValueError: (22, 'Invalid argument')

Now, after modified function utc2str, I can see 16 messages from rss of yours. lol

I still confused how can I auth my SinaWeiboStatus? Write a script to fetch token? Or use snscli to get a xxx.token, and move it to sns-router dir?

hupili commented 11 years ago

I forget to document the the configurations, you can do something like this: (in channel.json)

  {
    "platform": "SinaWeiboStatus",
    "methods": "home_timeline,update,forward",
    "user_id": "",
    "user_name": "snsapi_test",
    "channel_name": "sina_account_1",
    "auth_info": {
      "save_token_file": "(default)",
      "callback_url": "http://127.0.0.1:8080/auth/second/",
      "cmd_fetch_code": "(local_webserver)",
      "cmd_request_url": "(default)"
    },
    "app_secret": "",
    "open": "yes",
    "app_key": "",
    "home_timeline":{
      "count": 100
    }
  },

Where it matters is the "callback_url" part. SRFE will intercept request_url and fetch_code methods of SNSBase. The above callback_url is the point to give authed code to SRFE. You may want to change the IP and port according to your srfe.conf.

With this configured. You can accomplish auth flow from "config" page.

Besides, you can acquire those ".save" files from snscli and put them under SRFE.

uxian commented 11 years ago

use callback_url and is cooler, heihei.

I tested sns-router on Mac OS X 10.8.1, works perfect! I will take some time to deal with bugs on Windows env.

If I use 'http://127.0.0.1:8080/auth/second/' as sinaWeibo's callback_url, how do configure it on open.weibo.com? I tried 'http://127.0.0.1:8080/', but not acceptable, 'http://127.0.0.1/' is accepted, but error happened when auth.

hupili commented 11 years ago

I think http://127.0.0.1:8080/auth/second/ should work. That's what I put there. I think the reason it does not accept is due to missing "unauth" callback url, for which I put http://vipc3.ie.cuhk.edu.hk:8080/unauth as a fake entry...

hupili commented 11 years ago

I added you to the collaborator of sns-router. Feel free to create new (issue) branch containing fixes on your environment. When finished, just drop me a message to pull it to "dev" branch. I can do some cross-checking and follow what's going on. I'll also brief you changes in the same way. @uxian

hupili commented 11 years ago

@uxian , there may be other problems arising from channel configuration. I put my configuration in the following page:

https://github.com/hupili/sns-router/wiki/Channel-Configuration

Some notice and explanations will be added. e.g. renren does not accept 127.* address so I use the IE server as a bouncer.

uxian commented 11 years ago

/wiki/Channel-Configuration is sweet.

Sina weibo now do not support callback_url like "http://127.0.0.1:8080/auth/second/", no port is allowed, at least I can't set like that. I suggest you do not to change your callback_url, or you can not set it back.

I also tested sns-router on my linux server, works good, except that it's a little bit hard to set ranking to "yes". I will keep on trying

hupili commented 11 years ago

Er? That's strange.. I just configured the port included callback url less than one month ago... I tried to search for official announcement but found nothing. There are some questions on the Internet but not answers.

Briefly note some ways to work around this:

uxian commented 11 years ago

Cool! I just find an idle server to run sns-router on port 80, that will be your first way. The rest two way are so cool!

hupili commented 11 years ago

@uxian I did not change my current callback_url for test (I'm afraid it won't get back..). I'm just thinking whether Sina is filtering other patters, like "127.0.0.1". So, would those equivalences of http://127.0.0.1:8080/auth/second/ work?

Anyway, I think a universal bouncer is needed. e.g. on Renren, the callback_url must be something reachable from the public Internet...

uxian commented 11 years ago

No, none of them works, more bad example :

http://localhost/auth/second/
http://www.snsrouter.com:8000/auth/second/
http://www.snsrouter.com:8000/auth.php

what works:

http://127.0.0.1/au.php
http://127.0.0.1/auth/second/

I think their rules are:

universal bouncer is a great and useful idea!

hupili commented 11 years ago

I deployed a simple bouncer:

Test urls:

https://snsapi.ie.cuhk.edu.hk/aux/bouncer/redir/localhost/8080/?code=testcode
https://snsapi.ie.cuhk.edu.hk/aux/bouncer/redir/127.0.0.1/8080/?code=testcode

So one can configure its callback url to be:

https://snsapi.ie.cuhk.edu.hk/aux/bouncer/redir/127.0.0.1/8080/?

or

https://snsapi.ie.cuhk.edu.hk/aux/bouncer/redir/127.0.0.1/8080/

"?" is dependent on the OSN's convention.

The target address is restricted to be localhosts but port is free of choice.

Code is in the snsapi-website repo:

https://github.com/hupili/snsapi-website/tree/master/aux/bouncer

hupili commented 11 years ago

If you have other deployment experience, you can collect them here

https://github.com/hupili/sns-router/wiki/System-Deployment-Case-Study

hupili commented 11 years ago

@uxian

Just restructured the feature extraction part. Now features can be enabled from autoweight.json. autoweight.json.example should be able to run directly. Currently, only topic depends on pymmseg. We can experience other features with the whole flow now.

"operation" is added to the frontend with some brief explanations. You can just execute them sequentially.

Hope it get through this time~

The way queue.py access training logic is very kludgery. It's just an assemble of the codes in analysis, in which I dumped more than needed data for offline analysis. Later I will cut off non-essential works and make the new functions more clear.

The latest code is on dev.

uxian commented 11 years ago

When I switched to dev, updated autoweight.josn and open queue.json to 'yes'. I got this error.

Traceback (most recent call last):
  File "srfe.py", line 47, in <module>
    q = SRFEQueue(sp)
  File "/home/lijunbo/Github/sns-router/queue.py", line 51, in __init__
    from ranking import score
  File "/home/lijunbo/Github/sns-router/ranking/score.py", line 21, in <module>
    from feature import Feature
  File "/home/lijunbo/Github/sns-router/ranking/feature.py", line 23, in <module>
    from wordseg import wordseg_clean
  File "/home/lijunbo/Github/sns-router/ranking/wordseg.py", line 17, in <module>
    mmseg.Dictionary.load_dictionaries()
  File "/usr/lib64/python2.6/site-packages/pymmseg_cpp-1.0.0-py2.6-linux-x86_64.egg/mmseg/__init__.py", line 20, in load_dictionaries
    raise IOError("Cannot open '%s'" % d)
IOError: Cannot open 'kdb/words.merged.dic'

So I commented out from wordseg import wordseg_clean, and got this

Traceback (most recent call last):
  File "srfe.py", line 47, in <module>
    q = SRFEQueue(sp)
  File "/home/lijunbo/Github/sns-router/queue.py", line 53, in __init__
    self.score = score.Score()
  File "/home/lijunbo/Github/sns-router/ranking/score.py", line 29, in __init__
    self.load_weight(fn_weight)
  File "/home/lijunbo/Github/sns-router/ranking/score.py", line 34, in load_weight
    self.feature_weight = json.loads(open(fn, 'r').read())
IOError: [Errno 2] No such file or directory: 'conf/weights.json'

So I created conf/weights.json and write {} to it, at last I can run srfe.py with queue.py :D

hupili commented 11 years ago

fixed:

Do you experience errors when using the "Operation" panel? (which then creates useful 'weights.json')

uxian commented 11 years ago

Actually, I open 'http://127.0.0.1:8080/config', Feature Weight table and Tags table are empty. I am sure that queue.py is yes, and autoweight.json is the default one, which contains 12 preferences and 4 features. I found no errors or exceptions about this, but I will try to figure it out, since this may due to environment issues.

When I click Prepare Training Data in http://127.0.0.1:8080/operation, an error was thrown.

Traceback (most recent call last):
  File "bottle/bottle.py", line 763, in _handle
    return route.call(**args)
  File "bottle/bottle.py", line 1572, in wrapper
    rv = callback(*a, **ka)
  File "bottle/bottle.py", line 3132, in wrapper
    result = func(*args, **kwargs)
  File "srfe.py", line 83, in wrapper_check_login
    return func(*al, **ad)
  File "srfe.py", line 153, in operation_prepare_training_data
    re = q.prepare_training_data()
  File "/home/lijunbo/Github/sns-router/queue.py", line 631, in prepare_training_data
    from analysis.select_samples import select_samples
  File "/home/lijunbo/Github/sns-router/analysis/select_samples.py", line 19, in <module>
    from feature import Feature
  File "/home/lijunbo/Github/sns-router/analysis/feature.py", line 22, in <module>
    from wordseg import wordseg_clean
  File "/home/lijunbo/Github/sns-router/analysis/wordseg.py", line 17, in <module>
    mmseg.Dictionary.load_dictionaries()
  File "/usr/lib64/python2.6/site-packages/pymmseg_cpp-1.0.0-py2.6-linux-x86_64.egg/mmseg/__init__.py", line 20, in load_dictionaries
    raise IOError("Cannot open '%s'" % d)
IOError: Cannot open 'kdb/words.merged.dic'

wow, sns-router is so desperate for words.dic :D

hupili commented 11 years ago

@uxian , I see. Old logic in "analysis" is not decoupled yet. I should handle the "Operations" related ones first.

For the "tags" table under "config", can you add your own tags using the button under the same headline? By default, there is no tags and user define tags according to their own criteria.

Also, I just realized that I can make one words.merged.dic here for you to download... Anyway, users will get the same dict if they operate in the same way... One can prepare his own wordseg dict...

hupili commented 11 years ago

Another issue I can forecast is the encoding issue for wordseg related operations. The pymmseg module assumes utf-8 encoding. On other platforms, transcoding is needed before feeding the message into pymmseg.

uxian commented 11 years ago

I can add "tag" in config page, but I cannot set relations between tags (assign "father"). And where does user added tags stored in? Seems not in autoweight.conf.

hupili commented 11 years ago

Sorry for the confusion of "parent". It's not implemented in the backend yet. I added the desired function description in a new issue.

The tags are stored in srfe_queue.db. Three tables, msg, tag, msg_tag are layed-out in usual way. "msg_tag" table only makes sense with the existence of "tag" table. Ideally, some json-confs should also be moved into this sqlite db, so that users only need to take this single file wherever they move their Router.

sqlite> select * from tag;
1|null|0|
2|mark|1|
3|gold|1|
4|silver|1|
5|bronze|1|
6|news|1|
7|interesting|1|
8|shit|0|
9|nonsense|1|
10|text|0|
11|tech|1|
uxian commented 11 years ago

I see. Here is another dependency, './kdb/tdict.pickle', when I click 'Prepare Training Data', this error prompt...

Traceback (most recent call last):
  File "bottle/bottle.py", line 763, in _handle
    return route.call(**args)
  File "bottle/bottle.py", line 1572, in wrapper
    rv = callback(*a, **ka)
  File "bottle/bottle.py", line 3132, in wrapper
    result = func(*args, **kwargs)
  File "srfe.py", line 83, in wrapper_check_login
    return func(*al, **ad)
  File "srfe.py", line 153, in operation_prepare_training_data
    re = q.prepare_training_data()
  File "/home/lijunbo/Github/sns-router/queue.py", line 631, in prepare_training_data
    from analysis.select_samples import select_samples
  File "/home/lijunbo/Github/sns-router/analysis/select_samples.py", line 19, in <module>
    from feature import Feature
  File "/home/lijunbo/Github/sns-router/analysis/feature.py", line 247, in <module>
    class Feature(object):
  File "/home/lijunbo/Github/sns-router/analysis/feature.py", line 259, in Feature
    feature_extractors.append(FeatureTopic(env))
  File "/home/lijunbo/Github/sns-router/analysis/feature.py", line 162, in __init__
    self.tdict = Serialize.loads(open(fn_tdict).read())
IOError: [Errno 2] No such file or directory: './kdb/tdict.pickle'