运行一定几率抛出异常终止

Lotus-95 commented 4 years ago

昨晚和今天白天两次出现这种情况，运行到一半出现以下异常信息： [INFO] loop, 09:30:11, [INFO] loop, 09:30:11, ======== Loop 719 ======== [INFO] loop, 09:30:11, [INFO] loop, 09:30:11, > Current tasks [INFO] loop, 09:30:11, ------------------------------ [INFO] loop, 09:30:11, 01. Course(数据可视化, 0, 信息科学技术学院) [INFO] loop, 09:30:11, 02. Course(凸优化, 0, 数学科学学院) [INFO] loop, 09:30:11, ------------------------------ [INFO] loop, 09:30:11, [INFO] loop, 09:30:11, > Current client: 1, (qsize: 3) [INFO] loop, 09:30:11, [INFO] loop, 09:30:11, Get SupplyCancel page 1 [ERROR] loop.error, 09:30:11, list index out of range Exception in thread Elective: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/username/PKUAutoElective/autoelective/loop.py", line 621, in run_elective_loop raise e File "/home/username/PKUAutoElective/autoelective/loop.py", line 328, in run_elective_loop plans = get_courses_with_detail(tables[0]) File "/home/username/PKUAutoElective/autoelective/parser.py", line 71, in get_courses_with_detail name, classno, school, status, = map(lambda ix: t[ix].xpath('.//text()')[0], ixs) File "/home/username/PKUAutoElective/autoelective/parser.py", line 71, in name, classno, school, status, = map(lambda ix: t[ix].xpath('.//text()')[0], ixs) IndexError: list index out of range

Alice-space commented 4 years ago

日志显示你的班号是0，请检查配置文件

Lotus-95 commented 4 years ago

日志显示你的班号是0，请检查配置文件

您好，我检查过了，我需要选的课在选课系统上班号确实是 0. 研究生选课列表里好像挺多班号都是 0 的？

zhongxinghong commented 4 years ago

好像是网页的格式有问题，但我不清楚是 t[ix] 有问题还是 xpath('.//text()')[0] 有问题，要不然你改一下 328 行，存一个页面下来看看吧，就类似于，'path_to_save_error_page.html' 可以拿 time.time() 之类的生成

try:
    plans = get_courses_with_detail(tables[0])
except IndexError as e:
    with open('path_to_save_error_page.html', 'rb', encoding='utf-8') as fp:
        fp.write(r.content)
    raise e

下次遇到错误的时候就可以分析一下返回的网页格式 :)

如果搞不清楚的话可以邮件联系

Yixuan-Wang commented 4 years ago

我也遇到了这个错误

控制台输出

``` [INFO] loop, 18:36:51, > Current client: 1, (qsize: 2) [INFO] loop, 18:36:51, [INFO] loop, 18:36:51, Get SupplyCancel page 1 [DEBUG] hook, 18:38:08, Dump request https://elective.pku.edu.cn/elective2008/edu/pku/stu/elective/controller/supplement/SupplyCancel.do to D:\Utilities\Elec\PKUAutoElective\log\request\1900014136\2020-09-23_18.37.42+0800.%2Felective2008%2Fedu%2Fpku%2Fstu%2Felective%2Fcontroller%2Fsupplement%2FSupplyCancel.do.gz [ERROR] loop.error, 18:38:08, list index out of range Traceback (most recent call last): File "D:\Utilities\Elec\PKUAutoElective\autoelective\loop.py", line 327, in run_elective_loop elected = get_courses(tables[1]) IndexError: list index out of range [INFO] loop, 18:38:08, [INFO] loop, 18:38:08, ======== END Loop 8159 ======== [INFO] loop, 18:38:08, Main loop sleep 4.683490673681483 s [INFO] loop, 18:38:08, Exception in thread Elective: Traceback (most recent call last): File "C:\Users\tom-y\scoop\apps\python\current\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Users\tom-y\scoop\apps\python\current\lib\threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "D:\Utilities\Elec\PKUAutoElective\autoelective\loop.py", line 621, in run_elective_loop raise e File "D:\Utilities\Elec\PKUAutoElective\autoelective\loop.py", line 327, in run_elective_loop elected = get_courses(tables[1]) IndexError: list index out of range ```

连接似乎被提前终止，导致 HTML 结构残缺，无法正确解析。

对应的请求日志

``` (以上省略) 公共基础日语（一）全校任选 4.0 6.0 黄博典 1 外国语学院

zhongxinghong commented 4 years ago

@Yixuan-Wang 这个请求日志很奇怪，能不能具体描述一下这个日志怎么来的

Yixuan-Wang commented 4 years ago

开了 debug_dump_request，这个 IndexError 发生之前的一个日志是 18.37.42 生成的，用 7zip 解压以后这个日志就是这个不完整的样子……

忽然发现自己和 Issue 里提到的错误类型虽然一样，产生的位置好像不太一样……

zhongxinghong commented 4 years ago

你是用 utils.py 里的 pickle_gzip_load 解析的日志文件吗 ? 这个看起来好像只是解压了 gzip，但是里面好像还是 pickle 序列化的结果，requests 用的数据结构都混在里面了 ...

如果是直接解压的 gzip，你可以试一下用 pickle_gzip_load 解压，代码大概是这样

from autoelective.utils import pickle_gzip_load
r = pickle_gzip_load(path_to_pgz_log_file)  # load requests.Response object from pgz dump file
with open(path_to_save_html, 'wb') as fp:
    fp.write(r.content)

如果你用了 pickle_gzip_load 解压，那可能就涉及 pickle/gzip 的版本兼容问题了 ...

Yixuan-Wang commented 4 years ago

我之前是用 7zip 直接解压了日志🤦用 pickle_gzip_load 解压得到的 HTML 最后几行是（几行缩进层级相同，去掉了缩进和空行）：

<td class="datagrid" align="center"><span style="width: 60">全校任选</span></td>
<td class="datagrid" align="center"><span style="width: 30">4.0</span></td>
<td class="datagrid" align="center"><span style="width: 45">6.0</span></td>
<td class="datagrid"><span style="width: 40%">黄博典</span></td>
<td class="datagrid" align="center"><span style="width: 30">1</span></td>
<td class="datagrid"><span style="width: 85">外国语学院</span></td>
<td class="datagrid" align="center"><span style="width: 30"> </span></td>
<td class="datagrid"><span styl

zhongxinghong commented 4 years ago

v5.0.1 中添加了对这个异常页面的捕获，而且也不会再异常终止了

zhongxinghong / PKUAutoElective

运行一定几率抛出异常终止 #44