youzan / YZSpamFilter

有赞垃圾内容过滤工具
283 stars 88 forks source link

缺少classifier.py文件 #3

Open kud1 opened 8 years ago

kud1 commented 8 years ago

运行filter.py时报错,ImportError: Bad magic number in /YZSpamFilter/classifier.pyc 应该是缺少classifier.py源文件吧?

michaelshing commented 8 years ago

python版本不一致导致的

jasperyang commented 5 years ago

这个文件给了吗~?

AlbertChanX commented 4 years ago

this is the file: 🗄️ classifier.txt

myfingerhurt commented 1 year ago

After 3 hours of digging, I have been successfully get this piece of code working. Here is the tips:

  1. convert python 2.7 to python 3 is not hard most of part of this code is the print, here is the regex to do that for you r'(print)(\s+.+$)', r'$1\($2\)'. Working with notepad++.
  2. flask need to be reinstalled, the latest version should work. from flask_restful import abort, Resource is the correct form, pip uninstall Flask pip uninstall flask-restful pip install flask-restful
  3. I have made some changes, since the python3 Unicode is native supported. isinstance(x, str) and isinstance(x, unicode) are unnecessary anymore re.compile(r'([\u4e00-\u9fa5]+)') has tiny changes this part has be changed, the same reason, liststr = [word for word in liststr if word not in stops] same here, you should delete theoe, reload(sys) and sys.setdefaultencoding('utf-8')
  4. The old test code was not URLEncoded, here is the better one curl -G -v "http://127.0.0.1:5060/api/spamfilter" --data-urlencode "query=赚钱test宝妈tes日赚学生兼职*.@打字员"
  5. This is a missing piece of the code, you should know where to put it.

    def chi2Q(x2, v, exp = _math.exp, min = min):
    '''Return prob(chisq >= x2, with v degrees of freedom).
    
    v must be even.
    '''
    assert v & 1 == 0
    # XXX If x2 is very large, exp(-m) will underflow to 0.
    m = x2 / 2.0
    sum = term = exp(-m)
    for i in range(1, v // 2):
        term *= m / i
        sum += term
    
    # With small x2 and large v, accumulated roundoff error, plus error in
    # the platform exp(), can cause this to spill a few ULP above 1.0.  For
    # example, chi2Q(100, 300) on my box has sum == 1.0 + 2.0**-52 at this
    # point.  Returning a value even a teensy bit over 1.0 is no good.
    return min(sum, 1.0)