lgc-NB2Dev / YetAnotherPicSearch

Yet another picture search plugin for nonebot2
GNU General Public License v3.0
101 stars 9 forks source link

部分图片会出现lxml.etree.ParserError: Document is empty #29

Closed dingzhenbaohujiaran closed 2 years ago

dingzhenbaohujiaran commented 2 years ago

YetAnotherPicSearch 、 go-cqhttp 、 nonebot 、 Python 版本及操作系统

YetAnotherPicSearch:1.6.0 go-cqhttp:1.0.0-rc3 nonebot:2.0.0beta.5 python: 3.8 WindowsServer2012R2

列出安装的 Python 包

aiocache==0.11.1 aiodns==3.0.0 aiofiles==0.8.0 aiohttp==3.8.1 aiosignal==1.2.0 anyio==3.6.1 APScheduler==3.9.1 arrow==1.2.2 asgiref==3.5.2 async-timeout==4.0.2 attrs==21.4.0 backports.zoneinfo==0.2.1 bbcode==1.1.0 beautifulsoup4==4.11.1 binaryornot==0.4.4 Brotli==1.0.9 cchardet==2.1.7 certifi==2022.5.18.1 cffi==1.15.1 chardet==5.0.0 charset-normalizer==2.0.12 click==8.1.3 cn2an==0.5.17 colorama==0.4.4 cookiecutter==1.7.3 cssselect==1.1.0 cycler==0.11.0 diskcache==5.4.0 emoji==1.7.0 fastapi==0.79.0 feedparser==6.0.10 fonttools==4.33.3 frozenlist==1.3.0 greenlet==1.1.2 h11==0.12.0 h2==4.1.0 hpack==4.0.0 httpcore==0.15.0 httptools==0.4.0 httpx==0.23.0 httpx-socks==0.7.3 hyperframe==6.0.1 idna==3.3 imageio==2.19.2 importlib-metadata==4.12.0 jieba==0.42.1 Jinja2==3.1.2 jinja2-time==0.2.0 kiwisolver==1.4.3 loguru==0.6.0 lxml==4.9.1 Markdown==3.4.1 MarkupSafe==2.1.1 matplotlib==3.5.2 msgpack==1.0.3 multidict==6.0.2 nb-cli==0.6.7 nonebot-adapter-onebot==2.1.3 nonebot-plugin-abbrreply==1.1 nonebot-plugin-abstract==1.0. nonebot-plugin-apscheduler==0 nonebot-plugin-crazy-thursday nonebot-plugin-heweather==0.6 nonebot-plugin-hikarisearch== nonebot-plugin-htmlrender==0. nonebot-plugin-imageutils==0. nonebot-plugin-memes==0.3.2 nonebot-plugin-moyu==0.3.0 nonebot-plugin-petpet==0.3.9 nonebot-plugin-picsearcher==0 nonebot-plugin-pixiv==1.0.7 nonebot-plugin-pixivrank-sear nonebot-plugin-setu==1.1.1 nonebot-plugin-smart-reply==0 nonebot2==2.0.0b5 numpy==1.22.3 opencv-python==4.5.5.64 opencv-python-headless==4.5.5 packaging==21.3 picimagesearch==3.3.11 Pillow==9.1.1 pinyin==0.4.0 playwright==1.24.1 poyo==0.5.0 proces==0.1.2 prompt-toolkit==3.0.30 pycares==4.2.2 pycparser==2.21 pydantic==1.9.1 pyee==8.1.0 pyfiglet==0.8.post1 Pygments==2.12.0 pygtrie==2.4.2 pymdown-extensions==9.5 pyparsing==3.0.9 pyquery==1.4.3 python-dateutil==2.8.2 python-dotenv==0.20.0 python-markdown-math==0.8 python-slugify==6.1.2 python-socks==2.0.3 pytz==2022.1 pytz-deprecation-shim==0.1.0. PyYAML==6.0 requests==2.28.1 rfc3986==1.5.0 ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.6 sgmllib3k==1.0.0 six==1.16.0 sniffio==1.2.0 soupsieve==2.3.2.post1 starlette==0.19.1 tenacity==8.0.1 text-unidecode==1.3 tomlkit==0.10.2 tqdm==4.64.0 typing_extensions==4.2.0 tzdata==2022.1 tzlocal==4.2 ujson==5.3.0 urllib3==1.26.9 uvicorn==0.18.2 watchfiles==0.16.1 watchgod==0.8.2 wcwidth==0.2.5 webdav4==0.9.7 websockets==10.1 win32-setctime==1.1.0 yarl==1.8.1 yetanotherpicsearch==1.6.1 youth-version-of-setu4==0.0.9 zhconv==1.4.3 zipp==3.8.1

如何复现

某些图片进行搜图会出现lxml.etree.ParserError: Document is empty,同一张图片,前面会出现lxml.etree.ParserError: Document is empty,然后过一段时间重新搜索又会正常搜索,但是目前有两张图是一直显示这个错 故障图1 故障图2 过程图 过程图2

期望行为

正常返回图片,无论相似度多低

实际行为

有少部分图片会出现lxml.etree.ParserError: Document is empty

NekoAria commented 2 years ago

无法复现,可能搜图网站返回的结果更新了。就算你把当时的日志发出来,也可能看不出什么。不过至少先贴出来看看吧。

dingzhenbaohujiaran commented 2 years ago

无法复现,可能搜图网站返回的结果更新了。就算你把当时的日志发出来,也可能看不出什么。不过至少先贴出来看看吧。

目前为止暂无再出现该情况,但是该sauceNao的API次数已经用完,即使再次注册账号API更换也显示暂时无法使用,需要重新开一个issue亦或是继续在该帖子下咨询呢

NekoAria commented 2 years ago

无法复现,可能搜图网站返回的结果更新了。就算你把当时的日志发出来,也可能看不出什么。不过至少先贴出来看看吧。

目前为止暂无再出现该情况,但是该sauceNao的API次数已经用完,即使再次注册账号API更换也显示暂时无法使用,需要重新开一个issue亦或是继续在该帖子下咨询呢

这个问题无解,除非你给 SauceNao 充钱。 或者你换个代理节点。 他的每日使用次数限制是和 IP 绑定的,你换 API key 没有意义。

dingzhenbaohujiaran commented 2 years ago

无法复现,可能搜图网站返回的结果更新了。就算你把当时的日志发出来,也可能看不出什么。不过至少先贴出来看看吧。

目前为止暂无再出现该情况,但是该sauceNao的API次数已经用完,即使再次注册账号API更换也显示暂时无法使用,需要重新开一个issue亦或是继续在该帖子下咨询呢

这个问题无解,除非你给 SauceNao 充钱。 或者你换个代理节点。 他的每日使用次数限制是和 IP 绑定的,你换 API key 没有意义。

好的,谢谢你的耐心解答,祝您生活愉快。

Mashiro69 commented 1 year ago

同样出现此问题,图片相似度约低于60%时,会成功返回搜图结果后,再次发送一条该图搜图失败E: ParserError('Document is empty')的消息。图片相似度较高时正常。

07-11 20:27:56 [INFO] nonebot | Event will be handled by Matcher(type='message', module=YetAnotherPicSearch)
07-11 20:27:57 [INFO] nonebot_plugin_gocqhttp | [3374222] 发送群 Neuro 的消息: [{"type":  ... (123780849)
07-11 20:28:03 [INFO] nonebot_plugin_gocqhttp | [3374222] 发送群 85xxxx40  的合并转发消息: [{"type":  ... (1885834855)
07-11 20:28:03 [ERROR] YetAnotherPicSearch | 该图 [https://gchat.qpic.cn/gchatpic_new/0/0-0-B6FA0FC135D85948ED554AC842833A72/0] 搜图失败
Traceback (most recent call last):
  File "<string>", line 17, in <module>
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/__init__.py", line 309, in run
    get_driver().run(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/drivers/fastapi.py", line 198, in run
    uvicorn.run(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/main.py", line 578, in run
    server.run()
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 467, in check_and_run_matcher
    await _run_matcher(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 419, in _run_matcher
    await matcher.run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 753, in run
    await self.simple_run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 728, in simple_run
    await handler(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/dependencies/__init__.py", line 108, in __call__
    return await cast(Callable[..., Awaitable[R]], self.call)(**values)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 272, in handle_image_search
    await image_search(
> File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 109, in image_search
    bot, event, await extra_handle(url, client), index
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/utils.py", line 206, in wrapper
    result = await func(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/ascii2d.py", line 15, in ascii2d_search
    color_res = await ascii2d_color.search(file=_file)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/ascii2d.py", line 63, in search
    return Ascii2DResponse(resp_text, resp_url)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py", line 77, in __init__
    data = PyQuery(fromstring(resp_text, parser=utf8_parser))
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/lxml/html/__init__.py", line 873, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

56c460561374e8875a6dd7a7522e9d95_720 image

NekoAria commented 1 year ago

看样子是 Ascii2d 搜索结果解析那边获取到的 resp_text 不正常,我这边无法复现。 麻烦你往你本机的 /root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py 第 77 行改为:

from loguru import logger
try:
    data = PyQuery(fromstring(resp_text, parser=utf8_parser))
except Exception:
    logger.info(resp(resp_text))

然后请你重启机器人后尝试复现,把打印出来的这部份日志发上来。

Mashiro69 commented 1 year ago

看样子是 Ascii2d 搜索结果解析那边获取到的 resp_text 不正常,我这边无法复现。 麻烦你往你本机的 /root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py 第 77 行改为:

from loguru import logger
try:
    data = PyQuery(fromstring(resp_text, parser=utf8_parser))
except Exception:
    logger.info(resp(resp_text))

然后请你重启机器人后尝试复现,把打印出来的这部份日志发上来。

ascii2d.py第77 行改为:

from loguru import logger
……
class Ascii2DResponse:
    def __init__(self, resp_text: str, resp_url: str):
        utf8_parser = HTMLParser(encoding="utf-8")
        #from loguru import logger
        try:
            data = PyQuery(fromstring(resp_text, parser=utf8_parser))
        except Exception:
            logger.info(resp(resp_text))
        #data = PyQuery(fromstring(resp_text, parser=utf8_parser))
        self.origin: PyQuery = data  # 原始数据
        # 结果返回值
        self.raw: List[Ascii2DItem] = [
            Ascii2DItem(i) for i in data.find("div.row.item-box").items()
        ]
        self.url: str = resp_url

再次搜索相同图片时报错:

07-12 19:36:32 [ERROR] YetAnotherPicSearch | 该图 [https://gchat.qpic.cn/gchatpic_new/0/0-0-56C460561374E8875A6DD7A7522E9D95/0] 搜图失败
Traceback (most recent call last):
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py", line 80, in __init__
    data = PyQuery(fromstring(resp_text, parser=utf8_parser))
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/lxml/html/__init__.py", line 873, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 17, in <module>
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/__init__.py", line 309, in run
    get_driver().run(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/drivers/fastapi.py", line 198, in run
    uvicorn.run(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/main.py", line 578, in run
    server.run()
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 467, in check_and_run_matcher
    await _run_matcher(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 419, in _run_matcher
    await matcher.run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 753, in run
    await self.simple_run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 728, in simple_run
    await handler(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/dependencies/__init__.py", line 108, in __call__
    return await cast(Callable[..., Awaitable[R]], self.call)(**values)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 272, in handle_image_search
    await image_search(
> File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 101, in image_search
    bot, event, await extra_handle(url, client), index
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/utils.py", line 206, in wrapper
    result = await func(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/ascii2d.py", line 15, in ascii2d_search
    color_res = await ascii2d_color.search(file=_file)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/ascii2d.py", line 63, in search
    return Ascii2DResponse(resp_text, resp_url)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py", line 82, in __init__
    logger.info(resp(resp_text))
NameError: name 'resp' is not defined
NekoAria commented 1 year ago

不好意思,我打错了,是 repr 不是 resp

from loguru import logger
try:
    data = PyQuery(fromstring(resp_text, parser=utf8_parser))
except Exception:
    logger.info(repr(resp_text))
Mashiro69 commented 1 year ago

07-12 20:44:50 [INFO] PicImageSearch | ''
07-12 20:44:50 [ERROR] YetAnotherPicSearch | 该图 [https://gchat.qpic.cn/gchatpic_new/0/0-0-56C460561374E8875A6DD7A7522E9D95/0] 搜图失败
Traceback (most recent call last):
  File "<string>", line 17, in <module>
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/__init__.py", line 309, in run
    get_driver().run(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/drivers/fastapi.py", line 198, in run
    uvicorn.run(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/main.py", line 578, in run
    server.run()
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 467, in check_and_run_matcher
    await _run_matcher(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/message.py", line 419, in _run_matcher
    await matcher.run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 753, in run
    await self.simple_run(bot, event, state, stack, dependency_cache)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/internal/matcher/matcher.py", line 728, in simple_run
    await handler(
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/nonebot/dependencies/__init__.py", line 108, in __call__
    return await cast(Callable[..., Awaitable[R]], self.call)(**values)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 272, in handle_image_search
    await image_search(
> File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/__init__.py", line 101, in image_search
    bot, event, await extra_handle(url, client), index
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/utils.py", line 206, in wrapper
    result = await func(*args, **kwargs)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/YetAnotherPicSearch/ascii2d.py", line 15, in ascii2d_search
    color_res = await ascii2d_color.search(file=_file)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/ascii2d.py", line 63, in search
    return Ascii2DResponse(resp_text, resp_url)
  File "/root/NoneBot/.venv/lib/python3.11/site-packages/PicImageSearch/model/ascii2d.py", line 84, in __init__
    self.origin: PyQuery = data  # 原始数据
UnboundLocalError: cannot access local variable 'data' where it is not associated with a value
NekoAria commented 1 year ago

你的网络环境不能正常访问 Ascii2d 吗?

Mashiro69 commented 1 year ago

访问正常 image

NekoAria commented 1 year ago

那么上传图片或者发送图片链接进行搜图呢? 可能就是这个请求出问题了。