Closed GuoZhaoHui628 closed 2 years ago
Hi @GuoZhaoHui628 :
你需要检查下你的配置参数是否缺少doc_source
,具体配置请参考:https://github.com/liuli-io/liuli/blob/main/liuli_config/wechat.json
@howie6879 谢谢,我复制这篇文章基于Liuli构建纯净的RSS公众号信息流的配置参数,出现上面问题,按照https://github.com/liuli-io/liuli/blob/main/liuli_config/wechat.json修改后,没在报错。
然后运行出现如下问题
[2022:02:18 11:41:44] INFO Liuli 采集器开始执行!
[2022:02:18 11:42:21] ERROR Liuli playwright 抓取出错: 老胡的储物柜 strTimeout 30000ms exceeded.
=========================== logs ===========================
navigating to "https://weixin.sogou.com/", waiting until "load"
============================================================
[2022:02:18 11:42:21] INFO Liuli 来自 None 的文章抓取失败! 👉 None/None/None
[2022:02:18 11:42:54] INFO Liuli playwright 匹配公众号 是不是很酷(isnt_it_cool) 成功! 正在提取最新文章: 软件工程师和算法竞赛
[2022:02:18 11:42:58] INFO Liuli 来自 liuli_wechat 的文章持久化失败! 👉 是不是很酷 there are no users authenticated, full error: {'ok': 0.0, 'errmsg': 'there are no users authenticated', 'code': 13, 'codeName': 'Unauthorized'}
[2022:02:18 11:42:58] INFO Liuli 🤗 微信公众号文章更新完毕(0/2)
[2022:02:18 11:42:58] INFO Liuli 采集器执行完毕!
[2022:02:18 11:42:58] INFO Liuli 处理器(after_collect): 开始执行!
[2022:02:18 11:42:58] INFO Liuli 处理器(after_collect): ad_marker 正在执行...
[2022:02:18 11:42:58] ERROR Liuli 执行失败!there are no users authenticated, full error: {'ok': 0.0, 'errmsg': 'there are no users authenticated', 'code': 13, 'codeName': 'Unauthorized'}
我的目的是希望能像你公众号文章提到的生成 RSS 源,因为部署在本地,这个实际IP我填写本机地址
LL_DOMAIN="http://192.168.31.118:8765"
请教一下如上错误是怎么回事,我哪里配置错了吗?
请检查你的网络
Hi @GuoZhaoHui628 :
问题解决了么?
@howie6879 木有,能提取文章,但找不到 RSS
[2022:02:18 22:04:40] INFO Liuli 采集器开始执行!
[2022:02:18 22:04:51] INFO Liuli playwright 匹配公众号 老胡的储物柜(howie_locker) 成功! 正在提取最新文章: 我的周刊(第026期)
[2022:02:18 22:04:55] INFO Liuli 来自 liuli_wechat 的文章持久化失败! 👉 老胡的储物柜 there are no users authenticated, full error: {'ok': 0.0, 'errmsg': 'there are no users authenticated', 'code': 13, 'codeName': 'Unauthorized'}
还是你的配置有问题,这里显示的是数据库你配置错了,验证没通过。
你先检查下https://github.com/liuli-io/liuli/blob/main/docker-compose.yaml
然后再检查pro.env
你检查如果不出问题,就把这两个配置都贴出来。
听你这样说,估计是我这MongoDB用户名和密码没填的原因,我直接复制的,第一次使用MongoDB不太懂,就没管这了,囧。
# ======================================系统环境配置======================================#
# 当前目录为模块
PYTHONPATH=${PYTHONPATH}:${PWD}
# =======================================数据库配置=======================================#
# MongoDB 用户名
LL_M_USER=""
# MongoDB 密码
LL_M_PASS=""
# MongoDB IP
# Docker Compose 形式启动的话,此行配置不变
LL_M_HOST="liuli_mongodb"
# MongoDB 端口
LL_M_PORT="27017"
# MongoDB DB 最好不要变
LL_M_DB="liuli"
# ======================================接口服务配置======================================#
# Flask 是否开启Flask的Debug模式
LL_FLASK_DEBUG=0
# Flask IP
LL_HOST="0.0.0.0"
# Flask 端口
LL_HTTP_PORT=8765
# 访问域名,没有域名填本机实际地址(因为要开放对外访问),如: http://192.168.0.1:8765
LL_DOMAIN="http://192.168.31.118:8765"
# Flask 服务启动的 worker 数量
LL_WORKERS=1
# =======================================分发器配置=======================================#
# 分发器终端配置,用户在环境变量配置好密钥后,在启动配置的 sender.sender_list 填写好想分发的终端即可
# 目前支持:ding[钉钉] wecom[企业微信] tg[Telegram] Bark
# 分发终端为钉钉必须配置的Token
LL_D_TOKEN=""
# 分发终端为企业微信的配置,如果不配置分发用户与部门,则默认会发送给所有部门的所有用户
LL_WECOM_ID=""
LL_WECOM_AGENT_ID="-1"
LL_WECOM_SECRET=""
# 企业微信分发用户(填写用户帐号,不区分大小写),多个用户用;分割
LL_WECOM_TO_USER=""
# 企业微信分发部门(填写部门名称),多个部门用;分割
LL_WECOM_PARTY=""
# TG 终端配置
LL_TG_CHAT_ID=""
LL_TG_TOKEN=""
# Bark推送链接
LL_BARK_URL=""
# =======================================备份器配置=======================================#
# 备份器目前支持: github mongodb
# 使用 mongodb 备份的话则默认使用上面配置的数据库地址进行备份
# 使用 github 备份的话需要填写以下配置
# 项目权限token
LL_GITHUB_TOKEN=""
# 文章保存项目地址,例:howie6879/liuli_backup 项目名称一定为 liuli_backup
LL_GITHUB_REPO=""
# 访问域名,可自定义也可以用默认的,如果用github做备份器就必填,以我个人备份项目为例地址为:https://howie6879.github.io/liuli_backup/
LL_GITHUB_DOMAIN=""
@howie6879 请问我现在需要怎么做,这MongoDB是一件安装的。
我看你上面教程提到
如果是本机开发,使用上述方法搭建的`MongoDB`,以下内容保持不变即可
LL_M_USER=""
LL_M_PASS=""
LL_M_HOST=""
LL_M_PORT="27017"
LL_M_DB="liuli"
你参考下这个配置:
docker-compose
:
version: "3"
services:
liuli_api:
image: liuliio/api:v0.1.2
restart: always
container_name: liuli_api
ports:
- "8765:8765"
volumes:
- ./pro.env:/data/code/pro.env
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_schedule:
image: liuliio/schedule:v0.2.1
restart: always
container_name: liuli_schedule
volumes:
- ./pro.env:/data/code/pro.env
- ./liuli_config:/data/code/liuli_config
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_mongodb:
image: mongo:3.6
restart: always
container_name: liuli_mongodb
environment:
- MONGO_INITDB_ROOT_USERNAME=liuli
- MONGO_INITDB_ROOT_PASSWORD=liuli
ports:
- "27027:27017"
volumes:
- ./mongodb_data:/data/db
command: mongod
networks:
- liuli-network
networks:
liuli-network:
driver: bridge
pro.env:
PYTHONPATH=${PYTHONPATH}:${PWD}
LL_M_USER="liuli"
LL_M_PASS="liuli"
LL_M_HOST="liuli_mongodb"
LL_M_PORT="27017"
LL_M_DB="admin"
LL_M_OP_DB="liuli"
LL_FLASK_DEBUG=0
LL_HOST="0.0.0.0"
LL_HTTP_PORT=8765
LL_DOMAIN=""
LL_WORKERS=1
LL_D_TOKEN=""
LL_WECOM_ID=""
LL_WECOM_AGENT_ID=
LL_WECOM_SECRET=""
LL_GITHUB_TOKEN=""
LL_GITHUB_REPO="你的/liuli_backup"
LL_GITHUB_DOMAIN="https://你的.github.io/liuli_backup"
@howie6879 我试试,这边网络太差了。
@GuoZhaoHui628 刚修复一个微信时间的bug,不清楚这个对你有没有影响,不管怎样先更新一下镜像:
docker pull liuliio/schedule:v0.2.1
cd liuli 文件夹
docke-compose down
docke-compose up
奇怪,更新后反而启动不了,我这边版本是liuliio/schedule:v0.2.1
Loading .env environment variables...
Loading .env environment variables...
Traceback (most recent call last):
File "src/liuli_schedule.py", line 19, in <module>
from src.backup.action import backup_doc
File "/data/code/src/backup/action.py", line 12, in <module>
from src.backup.backup_factory import backup_factory
File "/data/code/src/backup/backup_factory.py", line 14, in <module>
from src.backup.base import BackupBase
File "/data/code/src/backup/base.py", line 9, in <module>
from src.config import Config
File "/data/code/src/config/__init__.py", line 5, in <module>
from .config import Config
File "/data/code/src/config/config.py", line 11, in <module>
class Config:
File "/data/code/src/config/config.py", line 76, in Config
WECOM_AGENT_ID = int(os.getenv("LL_WECOM_AGENT_ID", "-1"))
ValueError: invalid literal for int() with base 10: ''
liuliio/api:v0.1.2 日志如下
[2022-02-19 08:47:02 +0800] [9] [INFO] Starting gunicorn 20.1.0
[2022-02-19 08:47:02 +0800] [9] [INFO] Listening at: http://0.0.0.0:8765 (9)
[2022-02-19 08:47:02 +0800] [9] [INFO] Using worker: gevent
[2022-02-19 08:47:02 +0800] [12] [INFO] Booting worker with pid: 12
[2022-02-19 08:47:02 +0800] [12] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/workers/ggevent.py", line 146, in init_process
super().init_process()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/root/.local/share/virtualenvs/code-nY5aaahP/lib/python3.9/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 790, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/data/code/src/api/http_app.py", line 10, in <module>
from src.api.views import bp_api, bp_backup, bp_rss
File "/data/code/src/api/views/__init__.py", line 8, in <module>
from .bp_api_v1 import bp_api
File "/data/code/src/api/views/bp_api_v1.py", line 11, in <module>
from src.config import Config
File "/data/code/src/config/__init__.py", line 5, in <module>
from .config import Config
File "/data/code/src/config/config.py", line 11, in <module>
class Config:
File "/data/code/src/config/config.py", line 76, in Config
WECOM_AGENT_ID = int(os.getenv("LL_WECOM_AGENT_ID", "-1"))
各项配置文件如下:
pro.env
PYTHONPATH=${PYTHONPATH}:${PWD}
LL_M_USER="liuli"
LL_M_PASS="liuli"
LL_M_HOST="liuli_mongodb"
LL_M_PORT="27017"
LL_M_DB="admin"
LL_M_OP_DB="liuli"
LL_FLASK_DEBUG=0
LL_HOST="0.0.0.0"
LL_HTTP_PORT=8765
LL_DOMAIN=""
LL_WORKERS=1
LL_D_TOKEN=""
LL_WECOM_ID=""
LL_WECOM_AGENT_ID=
LL_WECOM_SECRET=""
LL_GITHUB_TOKEN=""
LL_GITHUB_REPO="你的/liuli_backup"
LL_GITHUB_DOMAIN="https://你的.github.io/liuli_backup"
docker-compose.yaml
version: "3"
services:
liuli_api:
image: liuliio/api:v0.1.2
restart: always
container_name: liuli_api
ports:
- "8765:8765"
volumes:
- ./pro.env:/data/code/pro.env
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_schedule:
image: liuliio/schedule:v0.2.1
restart: always
container_name: liuli_schedule
volumes:
- ./pro.env:/data/code/pro.env
- ./liuli_config:/data/code/liuli_config
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_mongodb:
image: mongo:3.6
restart: always
container_name: liuli_mongodb
environment:
- MONGO_INITDB_ROOT_USERNAME=liuli
- MONGO_INITDB_ROOT_PASSWORD=liuli
ports:
- "27027:27017"
volumes:
- ./mongodb_data:/data/db
command: mongod
networks:
- liuli-network
networks:
liuli-network:
driver: bridge
default.json
{
"name": "wechat",
"author": "liuli_team",
"doc_source": "liuli_wechat",
"collector": {
"wechat_sougou": {
"wechat_list": [
"老胡的储物柜", "是不是很酷"
],
"delta_time": 5,
"spider_type": "playwright"
}
},
"processor": {
"before_collect": [],
"after_collect": [{
"func": "ad_marker",
"cos_value": 0.6
}, {
"func": "to_rss",
"doc_source_list": ["liuli_wechat"],
"link_source": "github"
}]
},
"sender": {
"sender_list": ["wecom"],
"query_days": 7,
"delta_time": 3
},
"backup": {
"backup_list": ["github", "mongodb"],
"query_days": 7,
"delta_time": 3,
"init_config": {},
"after_get_content": [{
"func": "str_replace",
"before_str": "data-src=\"",
"after_str": "src=\"https://images.weserv.nl/?url="
}]
},
"schedule": {
"period_list": [
"00:10",
"12:10",
"21:10"
]
}
}
default.json
你看这个配置,你设置了 sender 和 backup,然而这块设置的内容你都没填
不太像这里原因吧,这里貌似只影响备份之类的,备份会失败,但前面抓取文章也失败,然后刚才连启动都失败了。我刚将sender和backup去掉运行还是一样错误。
我只想看看能否通过这项目生成公众号的RSS源,没其他需求。
不太像这里原因吧,这里貌似只影响备份之类的,备份会失败,但前面抓取文章也失败,然后刚才连启动都失败了。我刚将sender和backup去掉运行还是一样错误。
你提供env的LL_WECOM_AGENT_ID= 格式错了,改成我.env的样子
我只想看看能否通过这项目生成公众号的RSS源,没其他需求。
那你就用这个配置,修改如下:
PYTHONPATH=${PYTHONPATH}:${PWD}
LL_M_USER="liuli"
LL_M_PASS="liuli"
LL_M_HOST="liuli_mongodb"
LL_M_PORT="27017"
LL_M_DB="admin"
LL_M_OP_DB="liuli"
LL_FLASK_DEBUG=0
LL_HOST="0.0.0.0"
LL_HTTP_PORT=8765
# 需要填你服务器真正的访问地址,这里如果不填,那么下面的github一定要配
LL_DOMAIN=""
LL_WORKERS=1
LL_D_TOKEN=""
LL_WECOM_ID=""
LL_WECOM_AGENT_ID="-1"
LL_WECOM_SECRET=""
# 这个最好配,因为你的rss如果访问的话,是访问这个github上的佩芬地址
LL_GITHUB_TOKEN=""
LL_GITHUB_REPO="你的/liuli_backup"
LL_GITHUB_DOMAIN="https://你的.github.io/liuli_backup"
原来如此,倒是成功了,谢谢你耐心的帮助。
刚好奇瞄了眼你的周刊,质量很高,又多了一个信息源。我也做了类似的东西破茧日报,但非原创憋个大招,认真向你介绍我的一个…。
简单分享我的配置和过程,希望能帮到后面的人(因为作者的项目不断更新,我这仅供参考)。
1、按照教程一键安装。
2、
docker-compose.yaml
version: "3"
services:
liuli_api:
image: liuliio/api:v0.1.2
restart: always
container_name: liuli_api
ports:
- "8765:8765"
volumes:
- ./pro.env:/data/code/pro.env
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_schedule:
image: liuliio/schedule:v0.2.1
restart: always
container_name: liuli_schedule
volumes:
- ./pro.env:/data/code/pro.env
- ./liuli_config:/data/code/liuli_config
links:
- liuli_mongodb
depends_on:
- liuli_mongodb
networks:
- liuli-network
liuli_mongodb:
image: mongo:3.6
restart: always
container_name: liuli_mongodb
environment:
- MONGO_INITDB_ROOT_USERNAME=liuli
- MONGO_INITDB_ROOT_PASSWORD=liuli
ports:
- "27027:27017"
volumes:
- ./mongodb_data:/data/db
command: mongod
networks:
- liuli-network
networks:
liuli-network:
driver: bridge
pro.env
PYTHONPATH=${PYTHONPATH}:${PWD}
LL_M_USER="liuli"
LL_M_PASS="liuli"
LL_M_HOST="liuli_mongodb"
LL_M_PORT="27017"
LL_M_DB="admin"
LL_M_OP_DB="liuli"
LL_FLASK_DEBUG=0
LL_HOST="0.0.0.0"
LL_HTTP_PORT=8765
# 需要填你服务器真正的访问地址,这里如果不填,那么下面的github一定要配
LL_DOMAIN="192.168.31.118"
LL_WORKERS=1
LL_D_TOKEN=""
LL_WECOM_ID=""
LL_WECOM_AGENT_ID="-1"
LL_WECOM_SECRET=""
# 这个最好配,因为你的rss如果访问的话,是访问这个github上的佩芬地址
LL_GITHUB_TOKEN="ghp_ZEZzfgp88ysHRaYV7C0BtFvl6bJA5G36ZtQQ"
LL_GITHUB_REPO="GuoZhaoHui628/liuli_backup"
LL_GITHUB_DOMAIN="https://GuoZhaoHui628.github.io/liuli_backup"
因为我部署在本地,所以这里的 LL_DOMAIN 填的是本地地址,后面这三个参数 LLGITHUB 按照备份器配置来获取生成。
default.json
{
"name": "wechat",
"author": "liuli_team",
"doc_source": "liuli_wechat",
"collector": {
"wechat_sougou": {
"wechat_list": [
"老胡的储物柜", "是不是很酷"
],
"delta_time": 5,
"spider_type": "playwright"
}
},
"processor": {
"before_collect": [],
"after_collect": [{
"func": "ad_marker",
"cos_value": 0.6
}, {
"func": "to_rss",
"doc_source_list": ["liuli_wechat"],
"link_source": "github"
}]
},
"sender": {
"sender_list": ["wecom"],
"query_days": 7,
"delta_time": 3
},
"backup": {
"backup_list": ["github", "mongodb"],
"query_days": 7,
"delta_time": 3,
"init_config": {},
"after_get_content": [{
"func": "str_replace",
"before_str": "data-src=\"",
"after_str": "src=\"https://images.weserv.nl/?url="
}]
},
"schedule": {
"period_list": [
"00:10",
"12:10",
"21:10"
]
}
}
@GuoZhaoHui628 不客气哈,你的日报质量也非常高,请问有rss地址吗,我看了是愿意订阅的。
你这样分享形式很适合newsletter的,参考我用的这个:https://howie6879.zhubai.love/
@howie6879 @GuoZhaoHui628
两位大佬,按此方法成功安装,也没有报错,问题是:RSS地址无法访问呀!
我的域名里填的是:LL_DOMAIN="domain.com"
,并不是填某个ip地址。这里是有什么问题吗?
@howie6879 @GuoZhaoHui628
两位大佬,按此方法成功安装,也没有报错,问题是:RSS地址无法访问呀!
我的域名里填的是:
LL_DOMAIN="domain.com"
,并不是填某个ip地址。这里是有什么问题吗?
填你主机公网ip
@howie6879 所以只可以用ip的形式,没有办法使用域名? 另外,不是有github里的备份吗?有办法通过Github的方式进行RSS推送吗?那个地址又是什么呢?如果只能用ip的形式进行rss推送,那我觉得liuli并不适合我~
@howie6879 所以只可以用ip的形式,没有办法使用域名? 另外,不是有github里的备份吗?有办法通过Github的方式进行RSS推送吗?那个地址又是什么呢?如果只能用ip的形式进行rss推送,那我觉得liuli并不适合我~
老哥你查下ip和域名的关系吧
大佬们,按照这个方法,其他都没问题,但是提示 “采集器类型不存在 wechat_sougou - {'wechat_list': ['老胡的储物柜', '是不是很酷'], 'delta_time': 5, 'spider_type': 'playwright'} -No module named 'src.collector.wechat_sougou'”, 请问这个怎么处理呢?谢谢。
大佬们,按照这个方法,其他都没问题,但是提示 “采集器类型不存在 wechat_sougou - {'wechat_list': ['老胡的储物柜', '是不是很酷'], 'delta_time': 5, 'spider_type': 'playwright'} -No module named 'src.collector.wechat_sougou'”, 请问这个怎么处理呢?谢谢。
这个问题通过使用“wechat.json"已经解决。 目前已经可以实现在rss客户端中订阅,只是打开原文链接的时候会报错,在我自己的 liuli_backup(https://github.com/chenxuanli1990/liuli_backup)里已经有记录了,请问这个怎么解决呢?谢谢
你要看看你项目ci有没有完成
@chenxuanli1990 我看了下 你的文章标题是不合法的
运行日志如下,请问这是啥问题。