qinxuye / cola

A high-level distributed crawling framework.
Other
1.5k stars 537 forks source link

dev版本无法登录 #41

Closed hitalex closed 9 years ago

hitalex commented 9 years ago

报错提示如下: Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in *bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(_self.__args, _self.kwargs) File "/home/kqc/github/cola/cola/job/container.py", line 123, in run self.init() File "/home/kqc/github/cola/cola/job/container.py", line 88, in init self.init_tasks() File "/home/kqc/github/cola/cola/job/container.py", line 104, in init_tasks is_local=self.is_local, job_name=self.job_name) File "/home/kqc/github/cola/cola/job/task.py", line 81, in init self.prepare() File "/home/kqc/github/cola/cola/job/task.py", line 102, in prepare self.executor.login() File "/home/kqc/github/cola/cola/job/executor.py", line 151, in login if not self._login(shuffle=random): File "/home/kqc/github/cola/cola/job/executor.py", line 174, in _login login_result = self.job_desc.login_hook(self.opener, kw) File "/home/kqc/github/cola/contrib/kweibo/init.py", line 40, in login_hook return loginer.login() File "/home/kqc/github/cola/contrib/kweibo/login.py", line 107, in login json_data = json.loads(regex.search(text).group(1)) File "/usr/lib/python2.7/json/init.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

Counters during running: {} Processing shutting down Shutdown finished Job id:8ZcGfAqHmzc finished, spend 0.34 seconds for running

我想我是不是彻底被封了...?

qinxuye commented 9 years ago

是不是被封了,登录你的账号看一下就知道了。

确认下先。

hitalex commented 9 years ago

账号是没问题的,可以在浏览器中登录。

qinxuye commented 9 years ago

我这里测试下来一切正常。debug看出错的网页是什么内容。

hitalex commented 9 years ago

login.py文件中的: postdata = urllib.urlencode(postdata) text = self.opener.open(login_url, postdata) 这里的text返回是空页面,其中只包含了script,如下(这里没有显示html标签?):

Sina Visitor System
qinxuye commented 9 years ago

这个问题比较奇怪,还没有遇到过。

hitalex commented 9 years ago

有没有可能我的ip被封?但是浏览器还可以登录。 请问正常情况下, · postdata = urllib.urlencode(postdata) text = self.opener.open(login_url, postdata)

        # Fix for new login changed since about 2014-3-28
        ajax_url_regex = re.compile('location\.replace\(\'(.*)\'\)')
        matches = ajax_url_regex.search(text)

· ajax_url_regex是去匹配什么呢?

qinxuye commented 9 years ago

这个地方是有次fix登录时候重写url的问题,你分析看看登录逻辑每一步网页得到的内容是什么。把出错前(包括出错的HTML)都记录下来,打包发送给我看看。

qin@qinxuye.me

hitalex commented 9 years ago

今天又没有问题了,奇怪...非常感谢!