Closed Yunxi-awa closed 5 months ago
你用的是什么方法? get_html_domain_all是获取全部域名的,通过访问禁漫发布页,官方这个我知道,效果和get_html_domain_all应该是一样的
你用的是什么方法? get_html_domain_all是获取全部域名的,通过访问禁漫发布页,官方这个我知道,效果和get_html_domain_all应该是一样的
运行代码:jmclt.get_html_domain_all() 应该是ban了
Traceback (most recent call last):
File "D:\Pycharm\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_client_interface.py", line 476, in get_html_domain_all
return JmModuleConfig.get_html_domain_all(postman or self.get_root_postman())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Python\3.11.6\Lib\site-packages\common\util\decorator_util.py", line 63, in func_exec
attr = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "E:\Python\3.11.6\Lib\site-packages\jmcomic\jm_config.py", line 251, in get_html_domain_all
resp = postman.get(cls.JM_PUB_URL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Python\3.11.6\Lib\site-packages\common\postman\postman_api.py", line 125, in get
return self.__get__()(url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Python\3.11.6\Lib\site-packages\curl_cffi\requests\__init__.py", line 92, in request
return s.request(
^^^^^^^^^^
File "E:\Python\3.11.6\Lib\site-packages\curl_cffi\requests\session.py", line 699, in request
raise RequestsError(str(e), e.code, rsp) from e
curl_cffi.requests.errors.RequestsError: Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to jmcomic.ltd:443 '. This may be a libcurl error, See https://curl.se/libcurl/c/libcurl-errors.html first for more details.
应该确实是被ban了
爬github上的源码应该就不会有问题了, github一般不会出问题
下个版本打算加入通过github获取域名的功能,代码如下:
class JmModuleConfig:
@classmethod
def get_html_domain_all_via_github(cls,
postman=None,
template='https://jmcmomic.github.io/go/{}.html',
index_range=(300, 309)
):
domain_set = set()
def fetch_domain(url):
resp = postman.get(url, allow_redirects=False)
text = resp.text
from .jm_toolkit import JmcomicText
for domain in JmcomicText.analyse_jm_pub_html(text):
if domain.startswith('jm365.work'):
continue
domain_set.add(domain)
from common import multi_thread_launcher
multi_thread_launcher(
iter_objs=[template.format(i) for i in range(*index_range)],
apply_each_obj_func=fetch_domain,
)
return domain_set
有了作者大大我的数据库插件才能继续写下去(虽然已经演化成一个基于jmcomic库独立项目了
仅获取到了18comic-cn.vip站点,但在我的地区被ban了 可实际上还有18comic-c.art, 18comic-c.xyz
这是禁漫天堂发布页的官方源码存储: https://github.com/jmcmomic/jmcmomic.github.io/blob/main/go/304.html 这里面的html存储的是最新的domain