B站投稿和infopush报错

sudingquan commented 4 years ago

[2020-08-17 17:30:20,269 B站投稿提醒] ERROR: Opening and ending tag mismatch: img line 35 and p, line 36, column 13 (, line 36) Traceback (most recent call last): File "/home/sudingquan/HoshinoBot/hoshino/service.py", line 325, in wrapper ret = await func() File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/biliVideo/init.py", line 49, in check_BiliVideo video = BV.parse_xml() File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/biliVideo/init.py", line 24, in parse_xml rss = etree.XML(self.xml) File "src/lxml/etree.pyx", line 3216, in lxml.etree.XML File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError File "", line 36 lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: img line 35 and p, line 36, column 13

Job "check (trigger: cron[minute='/5', second='30'], next run at: 2020-08-17 17:30:30 CST)" raised an exception Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base_py3.py", line 29, in run_coroutine_job retval = await job.func(job.args, **job.kwargs) File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 76, in check if info.check_update(): File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 31, in check_update if self.parse_xml().get('pubDate') != self._latest: File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 18, in parse_xml rss = etree.XML(self.xml) File "src/lxml/etree.pyx", line 3216, in lxml.etree.XML File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError File "", line 36 lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: img line 35 and p, line 36, column 13

zangxx66 commented 4 years ago

util4sh.py162行左右，把base_url换成https://rsshub.app试试

sudingquan commented 4 years ago

啊这 [2020-08-18 12:47:37,189 pcr国服推送] ERROR: Cannot connect to host rsshub.app:443 ssl:default [Cannot assign requested address] Job "check (trigger: cron[minute='/5', second='30'], next run at: 2020-08-18 12:50:30 CST)" raised an exception Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base_py3.py", line 29, in run_coroutine_job retval = await job.func(job.args, **job.kwargs) File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 78, in check if info.check_update(): File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 31, in check_update if self.parse_xml().get('pubDate') != self._latest: File "/home/sudingquan/HoshinoBot/hoshino/modules/shebot/infoPush/init.py", line 18, in parse_xml rss = etree.XML(self.xml) File "src/lxml/etree.pyx", line 3216, in lxml.etree.XML File "src/lxml/parser.pxi", line 1895, in lxml.etree._parseMemoryDocument ValueError: can only parse strings

mujiwob commented 4 years ago

-55469559b2eb68c4 -22c23effe9b1937 我把rss源换成了自己的也是这个问题，是要对rsshub进行什么设置吗

zangxx66 commented 4 years ago

啊这，rsshub.app原来套的是cf，这个cdn对国内机房不友好，我是海外机房难怪没事

mujiwob commented 4 years ago

啊这，rsshub.app原来套的是cf，这个cdn对国内机房不友好，我是海外机房难怪没事

啊这，那就只有挂梯子或者换成国外服务器了？

zangxx66 commented 4 years ago

似乎是handle_xmlimg()的问题，之前我对这个模块做过一些修改，仅供参考 /shebot/infopush/__init_\.py

async def handle_xml_img(xml_str: str) -> str:
    for label in re.findall('<img src="(.+?)".+?>', xml_str):  
        if(re.search("(.+?)", label)):
            pic = await R.image_from_url(label)
            xml_str = re.sub('<.+?>', str(pic), xml_str, 1)
    return xml_str   
...  
...  
@nonebot.scheduler.scheduled_job('cron', minute='*/5', second='30')
async def check():
    for sv in _inf_svs:
        for info in _inf_svs[sv]:
            try:
                await info.get()
            except Exception as ex:
                sv.logger.error(ex)
            if info.check_update():
                _latest_data[info.route] = info._latest
                save_config(_latest_data, _latest_path)
                sv.logger.info(f'检查到{sv.name}消息更新')
                data = info.parse_xml()
                title = data['title']
                link = data['link']
                desc = await handle_xml_img(data['desc'])
                res = html2text.html2text(desc)
                txt = re.sub(r'!\[\]\(http\S*\?format=(png|jpg|bmp|gif|jpeg|webp)&name=\S*\)', '', res)
                await broadcast(f'{author}\n\n{txt}\n{link}', sv_name=sv.name)
            else:
                sv.logger.info(f'未检查到{sv.name}消息更新')

util4sh.py

@classmethod
    async def image_from_url(cls, url: str, cache=True) -> 'MessageSegment':
        return MessageSegment.image(f'{url}')

mujiwob commented 4 years ago

我发现我之前改rss源的时候在末尾多加了个/，emmm丢人了。顺带现在变成这样了，是不是没获取到订阅信息？ 6J)}5LIHS}26W76M)BFVTKO

刚刚试着在服务器上ping了下rsshub.app，丢包100%（

zangxx66 commented 4 years ago

丢包100%是不正常的，试试跟着教程自己部署rsshub

mujiwob commented 4 years ago

我就是在bot的服务器上用npm部署的rsshub，用我的电脑访问服务器ip:1200 也可以打开rsshub的界面，但就是在服务器上ping rsshub.app就会100%丢包

zangxx66 commented 4 years ago

你自己搭建的话，访问服务器是看到你自己的rsshub而并非rsshub.app，我猜测你的服务器是在大陆可能访问rsshub.app比较困难，自己搭建的rsshub也是一样的订阅规则，不过twitter之类的订阅应该是行不通的了

mujiwob commented 4 years ago

恩，确实，我把订阅路由里那些会被墙的拿掉了就没问题了

yuban01652 commented 4 years ago

出错原因是bilibili反爬了rsshub.app，在机器人的服务器上自己搭建rsshub并且修改util4sh.py的152行self.base_url为http://localhost:1200即可

pcrbot / plugins-for-Hoshino

B站投稿和infopush报错 #5