禁漫的重定向很迷，有方法可以获取重定向后的albumID吗

Yunxi-awa commented 5 months ago

举个例子：当访问aid=139时自动跳转至aid=120853 导致存储139数据时出现错误有方法可以在获取数据时知道重定向后的aid吗

hect0x7 commented 5 months ago

假设你使用get_album_detail（139），返回的album.album_id 应该就会是120853

Yunxi-awa commented 5 months ago

假设你使用get_album_detail（139），返回的album.album_id 应该就会是120853

啊？我试过来着，我再试一试

hect0x7 commented 5 months ago

这个问题，应该说又是html和api不一致的点，html的实现可以返回120853，但是api不会。

hect0x7 commented 5 months ago

这个问题可能也没有太好的办法，从结果上讲，用139作为album_id都能访问到目标本子，这点对于观看和下载就足够了。你有啥需求是一定要获取 “重定向后的albumID” 吗？

Yunxi-awa commented 5 months ago

这个问题可能也没有太好的办法，从结果上讲，用139作为album_id都能访问到目标本子，这点对于观看和下载就足够了。你有啥需求是一定要获取 “重定向后的albumID” 吗？

为了方便分类

假设aid = 139 在搜索的时候可以获取漫画大标签，通过api获取到的是aid = 120853的数据，导致按照获取的数据不可能搜索到aid = 139的大标签，浪费资源

hect0x7 commented 5 months ago

没太懂你的意思、可以用代码说明吗

Yunxi-awa commented 5 months ago

没太懂你的意思、可以用代码说明吗

代码太长了😂 我再说得细致些:

比如我想获取 aid = 139 的大标签, 但禁漫没有提供直接接口所以只能通过搜索该漫画获取

而获取 aid = 139 的数据时会重定向至 aid = 120853, 导致搜索结果内永不包含 aid = 139 这一项, 为了避免浪费资源, 需判断是否重定向, 这样就可以在写入数据库时直接将 aid = 120853 的数据复制至 aid = 139

Yunxi-awa commented 5 months ago

这里不是直接page=1,2,3......这样, 而是按照aid比例优化了算法的, 能节省资源

    def getType(self, aid: int, authors: list, name: str) -> tuple | typing.NoReturn:
        """
        param::aid     album_id
        param::authors author list
        param::name    album name
        """
        def fetch(lst: list):
            for item in lst:
                page1 = jmclt.search_site(item)
                first_search_pagination = math.floor(aid / (total_aid / page1.total))
                for i in (-i if i % 2 == 1 else i for i in range(1, max(page1.total, page1.total + 1 - first_search_pagination) * 2)):
                    if page1.total >= first_search_pagination > 1:
                        pages.append(jmclt.search_site(item, first_search_pagination))
                    elif first_search_pagination == 1:
                        pages.append(page1)
                    first_search_pagination += i
        names: list = apiValue.findNameAndAttributes(name)
        # https://2dfan.com/tags/Escu:de/
        pages = []
        fetch(authors + names)
        for page in pages:
            for _, info in page.content:
                if aid == int(info["id"]):
                    return info["category"]["title"], info["description"]
        jm_log("getType", "没有搜索到结果！")
        return "Error", "Error"

没太懂你的意思、可以用代码说明吗

hect0x7 commented 5 months ago

首先啦，我想泼下冷水，感觉获取这个“大标签”，其实叫 category，没啥意义。。其实，如果是我来实现这个需求，我会这么来做，以下是可以成功获取到139的category的代码：

from jmcomic import *
op = create_option_by_env()
api_cl = op.new_jm_client(impl='api')
html_cl = op.new_jm_client(impl='html')

def fetch_category_by_album_id(album_id):
    album = html_cl.get_album_detail(album_id)

    def check_is_same_album(aid, atitle):
        if aid == album.album_id or atitle == album.title:
            return True

        if atitle.startswith(album.title):
            return True

        # 还可以加更多判断，视情况而定
        return False

    # 用本子名称来搜索
    page = api_cl.search_site(album.oname[-6:-1])
    # print(len(page), page.total)
    for aid, ainfo in page.content:
        atitle = ainfo['name']
        if check_is_same_album(aid, atitle):
            # 找到了
            category = ainfo['category']['title']
            return category

Yunxi-awa commented 5 months ago

首先啦，我想泼下冷水，感觉获取这个“大标签”，其实叫 category，没啥意义。。其实，如果是我来实现这个需求，我会这么来做，以下是可以成功获取到139的category的代码：

from jmcomic import *
op = create_option_by_env()
api_cl = op.new_jm_client(impl='api')
html_cl = op.new_jm_client(impl='html')

def fetch_category_by_album_id(album_id):
    album = html_cl.get_album_detail(album_id)

    def check_is_same_album(aid, atitle):
        if aid == album.album_id or atitle == album.title:
            return True

        if atitle.startswith(album.title):
            return True

        # 还可以加更多判断，视情况而定
        return False

    # 用本子名称来搜索
    page = api_cl.search_site(album.oname[-6:-1])
    # print(len(page), page.total)
    for aid, ainfo in page.content:
        atitle = ainfo['name']
        if check_is_same_album(aid, atitle):
            # 找到了
            category = ainfo['category']['title']
            return category

不是所有韩漫都有韩漫标签（目的仅此而已🤣, 不过感谢大大指路!

hect0x7 / JMComic-Crawler-Python

禁漫的重定向很迷，有方法可以获取重定向后的albumID吗 #209