hect0x7 / JMComic-Crawler-Python

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/#
MIT License
556 stars 1.18k forks source link

categories_filter_gen结果为空 #212

Closed Yunxi-awa closed 4 months ago

Yunxi-awa commented 5 months ago

奇妙的bug,直接访问https://18comic-c.art/albums/?page=1&o=mr&t=a发现是有内容的

import sys; print('Python %s on %s' % (sys.version, sys.platform))
E:\Python\3.11.6\python.exe -X pycache_prefix=C:\Users\云熙awa\AppData\Local\JetBrains\PyCharmCE2023.3\cpython-cache D:/Pycharm/plugins/python-ce/helpers/pydev/pydevd.py --multiprocess --client 127.0.0.1 --port 54780 --file E:\PythonProject\JMDownload\new\apiWeb.py 
已连接到 pydev 调试器(内部版本号 233.13763.11)2024-02-03 16:06:21:【plugin.invoke】调用插件: [login]
2024-02-03 16:06:21:【html】https://18comic-c.art/login
2024-02-03 16:06:22:【plugin.login】登录成功
2024-02-03 16:06:59:【html】https://18comic-c.art/albums/?page=1&o=mr&t=a
**我debug时输入的代码:next(next(jmclt.categories_filter_gen()).iter_id())**
PyDev console: starting.
2024-02-03 16:07:10:【html】https://18comic-c.art/albums/?page=1&o=mr&t=a
Traceback (most recent call last):
  File "D:\Pycharm\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
StopIteration

感觉自己捅了bug窝了,快把jmcomic底裤都翻出来了

hect0x7 commented 5 months ago
  1. 请贴上你的代码,即使你是debug时输入的,也请换成正常的代码,方便我们讨论
  2. 你有测试过更通用一点的域名吗?例如 18comic.vip jmcomic.me
Yunxi-awa commented 5 months ago
  1. 请贴上你的代码,即使你是debug时输入的,也请换成正常的代码,方便我们讨论
  2. 你有测试过更通用一点的域名吗?例如 18comic.vip jmcomic.me

函数目的:获取最新album_id 调用代码:

jmclt = config.normal.DEFAULT_JMCOMIC_OPTION.new_jm_client()
total_aid = getLatestAid(jmclt)

问题函数:

def getLatestAid(jmclient):
    return int(next(next(jmclient.categories_filter_gen()).iter_id()))

经测试其他的域名没有问题, 但问题是被ban了😅

hect0x7 commented 5 months ago

我试了下,应该是正则不适配的问题了。待优化 JmPageTool.pattern_html_category_album_info_list

Yunxi-awa commented 5 months ago

我试了下,应该是正则不适配的问题了。待优化 JmPageTool.pattern_html_category_album_info_list

最近禁漫的网站布局有变动, 说实话用Xpath和css selector要好很多, 不过重构工作量太大了, 只能说要还技术债了😪

hect0x7 commented 4 months ago

v2.5.6 已适配