hect0x7 / JMComic-Crawler-Python

Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/#
MIT License
557 stars 1.18k forks source link

发现一个可能的Bug,无法正确匹配上架和更新日期 #193

Closed Yunxi-awa closed 5 months ago

Yunxi-awa commented 5 months ago

我简单写了一个python程序来下载album和album的属性,其中重写了模块的JmDownloader类,但提交上架和更新日期时发现值为0,不论如何修改jm_toolkit中的正则表达式,亦或者更换api模式,均不能解决问题,最后在jm_entity的JmAlbumDetail类中直接加print发现值也为0,初步判断是模块的问题,但我能力太差,找不到Bug的原因。。。。。。

以下是程序代码,option只定义了login扩展的使用,涉及账号密码就不展示了

import time
import jmcomic
import sqlite3
import queue
import threading
import sys

albumData_q = queue.Queue(64)
album_conn = sqlite3.connect("../album.db", check_same_thread=False)
# album_conn.execute('''CREATE TABLE Album(
#                         ID          INTEGER KEY NOT NULL UNIQUE,
#                         name        TEXT NOT NULL,
#                         chapter     TEXT,
#                         chapterID   TEXT,
#                         chapterName TEXT,
#                         chapterPage TEXT,
#                         author      TEXT,
#                         actor       TEXT,
#                         tag         TEXT,
#                         pub_date    TEXT,
#                         upd_date    TEXT);''')

getOption = jmcomic.create_option("opt.yml")
jm_log = jmcomic.JmModuleConfig.jm_log

baseDir = "E:/JMAlbum/"

# 出现问题的类
class superDownloader(jmcomic.JmDownloader):
    def __init__(self, option: jmcomic.JmOption):
        super().__init__(option)
        self.data: dict = {}
        self.chapterID: list = []
        self.chapterName: list = []
        self.chapterPage: list = []

    def after_album(self, album: jmcomic.JmAlbumDetail):
        super().after_album(album)
        self.data = {"ID": album.album_id, "name": album.name, "chapter": len(album),
                     "chapterID": self.chapterID, "chapterName": self.chapterName, "chapterPage": self.chapterPage,
                     "author": album.authors, "actor": album.actors, "tag": album.tags,
                     "pub_date": album.pub_date, "upd_date": album.update_date} # 发现album.pub_date和album.update_date为0
        albumData_q.put(self.data)
        jmcomic.default_jm_logging("album.after.q", "报告成功")
        self.option.call_all_plugin(
            'after_album',
            album=album,
            downloader=self,
        )

    def after_photo(self, photo: jmcomic.JmPhotoDetail):
        super().after_photo(photo)
        self.chapterID.append(photo.photo_id)
        self.chapterName.append(photo.name)
        print(len(photo))
        self.chapterPage.append(len(photo))
        jmcomic.default_jm_logging("photo.after.q", "刷新成功")
        self.option.call_all_plugin(
            'after_photo',
            photo=photo,
            downloader=self,
        )

#用来提交数据的线程
class dataBaseExecutor(threading.Thread):
    def __init__(self):
        super().__init__()
        self.album_cs_w = album_conn.cursor()

    def run(self):
        for i in range(sys.maxsize):
            data: dict = albumData_q.get(True, None)
            data: list = [str(i) for i in data.values()]
            print(data)
            try:
                self.album_cs_w.execute(f'''INSERT INTO Album VALUES ({"?, "*(len(data)-1)}?);''', data)
                jmcomic.default_jm_logging("db", "插入成功")
            except Exception as error:
                jmcomic.default_jm_logging("db",
                                           f"插入失败:代码-INSERT INTO Album VALUES ({'?, '*(len(data)-1)}?); 错误-{error} 数据-{data}")
            if i % 32 == 31:
                album_conn.commit()
                jmcomic.default_jm_logging("db", "提交成功")

    def quit(self):
        album_conn.commit()
        self.album_cs_w.close()

#主线程
def main():
    dbMaster = dataBaseExecutor()
    dbMaster.daemon = True
    dbMaster.start()
    id = 4 #aid
    try:
        jmcomic.download_album(id, getOption, superDownloader)
    except jmcomic.JmcomicException as e:
        if "本子不存在" in e:
            data = {"ID": id, "name": "", "chapter": "",
                    "chapterID": "", "chapterName": "", "chapterPage": "",
                    "author": "", "actor": "", "tag": "",
                    "pub_date": "", "upd_date": ""}
            albumData_q.put(data)
    time.sleep(10)
    album_conn.commit()

if __name__ == "__main__":
    main()
hect0x7 commented 5 months ago

其实不是bug,是有意为之,根本原因是禁漫APP的本子详情接口不返回更新日期、发布日期这些字段,但是网页有返回。 所以当使用JmApiClient时,只能对这些字段赋特殊值,来兼容实体类,对应代码位于:https://github.com/hect0x7/JMComic-Crawler-Python/blob/fc2f2a908d9922f4daefe1df6bda364988b1c237/src/jmcomic/jm_toolkit.py#L679-L682

hect0x7 commented 5 months ago

你可以换成网页端Client试试,配置下option的client.impl = 'html'即可

示例代码:

from jmcomic import *

op = create_option_by_env()
cl = op.new_jm_client(impl='html')

album = cl.get_album_detail(123)
print(album.pub_date)
print(album.update_date)

打印信息

2024-01-13 16:57:00:【html】https://18comic.vip/album/123
2018-03-12
2022-12-05
Yunxi-awa commented 5 months ago

有点无奈的是html不知为何会请求失败,但浏览器打开可以访问,并且f12看也没有重定向

hect0x7 commented 5 months ago

有点无奈的是html不知为何会请求失败,但浏览器打开可以访问,并且f12看也没有重定向

option有和浏览器配置一样的代理吗?

Yunxi-awa commented 5 months ago

浏览器未配置任何代理、、、

hect0x7 commented 5 months ago

看下你的option配置?敏感信息记得删除掉

Yunxi-awa commented 5 months ago

仅此而已,醉了

download:
  image:
    suffix: .jpg
  threading:
    image: 32
    photo: 16

dir_rule:
  base_dir: E:/JMAlbum/
  rule: Bd_Aid_Pindex

plugins:
  after_init:
    - plugin: login
      kwargs:
        username: ***
        password: ***
Yunxi-awa commented 5 months ago

发现官方文档有误:https://jmcomic.readthedocs.io/en/stable/option_file_syntax/ 经测试如果opt不定义impl则不止选择html,也会使用api

hect0x7 commented 5 months ago

看你option也没有配置代理,这样的话,默认jmcomic会使用系统代理,浏览器也是默认遵循系统代理,按理说是一样的。 你浏览器访问的域名和jmcomic访问的域名一样吗?

hect0x7 commented 5 months ago

发现官方文档有误:https://jmcomic.readthedocs.io/en/stable/option_file_syntax/ 经测试如果opt不定义impl则不止选择html,也会使用api

你是对的,默认的Client类型取决于 JmModuleConfig.DEFAULT_CLIENT_IMPL,最近我改成了api,文档还没有更新

Yunxi-awa commented 5 months ago

是一致的,但http一直报错Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18comic-cn.vip:443 '. This may be a libcurl error, See https://curl.se/libcurl/c/libcurl-errors.html first for more details. 另外发现网站:https://jcomic-cn.vip

hect0x7 commented 5 months ago

是一致的,但http一直报错Failed to perform, ErrCode: 35, Reason: 'BoringSSL SSL_connect: Connection was reset in connection to 18comic-cn.vip:443 '. This may be a libcurl error, See https://curl.se/libcurl/c/libcurl-errors.html first for more details. 另外发现网站:https://jcomic-cn.vip

Connection was reset 说明被墙了,你使用的网络访问不了18comic-cn.vip。 那但按理说,你用浏览器访问18comic-cn.vip,也会得到Connection was reset呀?

Yunxi-awa commented 5 months ago

迷惑的点就在这里,问题又回到原点了、、、、我再查查防火墙,最近刚重装电脑

Yunxi-awa commented 5 months ago

迷惑的点就在这里,问题又回到原点了、、、、我再查查防火墙,最近刚重装电脑

生草,查了一遍,没有使用系统代理,defender也没抽风,vpn更是没开,网络端口也没被占用

hect0x7 commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art
Yunxi-awa commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art

同样的问题,看来解决不是很简单了,我先凑合用api吧

Yunxi-awa commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art

同样的问题,看来解决不是很简单了,我先凑合用api吧

发现一件神奇的事,我能直连访问18comic.vip

hect0x7 commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art

同样的问题,看来解决不是很简单了,我先凑合用api吧

发现一件神奇的事,我能直连访问18comic.vip

啊??

hect0x7 commented 5 months ago

国内不挂代理能访问?惊了,你哪个地区的,方便透露吗 😮访问https://ipinfo.io/ 看看

Yunxi-awa commented 5 months ago

给的地址不是我的所在地,偏了几百公里到江苏了哈哈哈

hect0x7 commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art

同样的问题,看来解决不是很简单了,我先凑合用api吧

发现一件神奇的事,我能直连访问18comic.vip

用jmcomic能访问18comic.vip吗

Yunxi-awa commented 5 months ago

换别的域名试试?比如

client:
  impl: html
  domain:
    html: 18comic-cool.art

同样的问题,看来解决不是很简单了,我先凑合用api吧

发现一件神奇的事,我能直连访问18comic.vip

用jmcomic能访问18comic.vip吗

时好时坏