he0119 / nonebot-plugin-wordcloud

适用于 NoneBot2 的词云插件
https://pypi.org/project/nonebot-plugin-wordcloud/
MIT License
81 stars 7 forks source link

Feature: 词云可以排除某些用户吗? #77

Open Ohdmire opened 1 year ago

Ohdmire commented 1 year ago

群里有多个bot发送的很多消息都是重复的,但是占了很大的比例

he0119 commented 1 year ago

暂时不能,看来这个功能还是有必要。我后面找个时间加上吧。

MSDNicrosoft commented 1 year ago

没想到吧,又是我(

如另一个议题一样,我也实现了。


可参考以下我实现的代码

msg_records = await get_messages_plain_text(
            user_ids=[user_id, ] if user_id else None,
            group_ids=[group_id, ],
            exclude_user_ids=[bot_id, *[f"{user}" for user in config.wordcloud_ignore_users]],
            time_start=start_time.astimezone(ZoneInfo("UTC")),
            time_stop=stop_time.astimezone(ZoneInfo("UTC"))
        )

其中重点是第四行

exclude_user_ids = [bot_id, *[f"{user}" for user in config.wordcloud_ignore_users]]

# 注: 变量 bot_id 类型同样也为 str

config.wordcloud_ignore_users 可以手动实现相关配置项

he0119 commented 1 year ago

没想到吧,又是我(

如另一个议题一样,我也实现了。


可参考以下我实现的代码

msg_records = await get_messages_plain_text(
            user_ids=[user_id, ] if user_id else None,
            group_ids=[group_id, ],
            exclude_user_ids=[bot_id, *[f"{user}" for user in config.wordcloud_ignore_users]],
            time_start=start_time.astimezone(ZoneInfo("UTC")),
            time_stop=stop_time.astimezone(ZoneInfo("UTC"))
        )

其中重点是第四行

exclude_user_ids = [bot_id, *[f"{user}" for user in config.wordcloud_ignore_users]]

# 注: 变量 bot_id 类型同样也为 str

config.wordcloud_ignore_users 可以手动实现相关配置项

我在思考是否需要一个数据库,分群进行设置。之前去写 ob12 支持了,所以一直没写这个。现在 datastore 支持迁移脚本,修改数据库也方便起来了。

MSDNicrosoft commented 1 year ago

我在思考是否需要一个数据库,分群进行设置。之前去写 ob12 支持了,所以一直没写这个。现在 datastore 支持迁移脚本,修改数据库也方便起来了。

数据库可能相对于使用者来说不好修改配置。我的建议是使用 json 等可序列化文件存储这部分配置。


下面是夹带私货

这类文件应当注意 安全地 进行读写,推荐使用 threading.Lock() 实现相关功能,或者也可以使用队列。

he0119 commented 1 year ago

我在思考是否需要一个数据库,分群进行设置。之前去写 ob12 支持了,所以一直没写这个。现在 datastore 支持迁移脚本,修改数据库也方便起来了。

数据库可能相对于使用者来说不好修改配置。我的建议是使用 json 等可序列化文件存储这部分配置。

配置肯定是通过命令来的,就像之前的每日定时发送的设置一样,所以应该不用太担心。我个人现在确实不太喜欢用文件了,数据库各种读写用起来都比文件舒服。

MSDNicrosoft commented 1 year ago

配置肯定是通过命令来的,就像之前的每日定时发送的设置一样,所以应该不用太担心。我个人现在确实不太喜欢用文件了,数据库各种读写用起来都比文件舒服。

虽然但是,我还是想说一下,但是最后如何实现完全取决于你,你可以选择不接受我的建议:

必定有一些用使用者喜欢手动通过配置文件来修改 (比如我

而且使用者可能想要导出配置等等

he0119 commented 1 year ago

配置肯定是通过命令来的,就像之前的每日定时发送的设置一样,所以应该不用太担心。我个人现在确实不太喜欢用文件了,数据库各种读写用起来都比文件舒服。

虽然但是,我还是想说一下,但是最后如何实现完全取决于你,你可以选择不接受我的建议:

必定有一些用使用者喜欢手动通过配置文件来修改 ~(比如我~

而且使用者可能想要导出配置等等

哈哈哈哈哈,这个我突然想到了一个方法,可以给词云加个 nb-cli 的 script,支持导入导出配置,感觉完美了。可以思考一下,哪些命令比较有用。

MSDNicrosoft commented 1 year ago

哈哈哈哈哈,这个我突然想到了一个方法,可以给词云加个 nb-cli 的 script,支持导入导出配置,感觉完美了。可以思考一下,哪些命令比较有用。

我觉得可以。

另外我想问一问,不知是 datastore 还是 chatrecorder 的问题:

当协议端(比如 go-cqhttp)一直保持运行,Nonebot 断开。 过较长时间,启动 Nonebot,协议端给 Nonebot 上报大量聊天记录,导致出现以下报错:

[2023-01-23 20:26:40] [ ERROR ]    nonebot     | Error when running EventPostProcessors
Traceback (most recent call last):
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 100, in execute
    self._adapt_connection._handle_exception(error)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 228, in _handle_exception
    raise error
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 82, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 68, in await_only
    return current.driver.switch(awaitable)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 121, in greenlet_spawn
    value = await result
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\cursor.py", line 37, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\cursor.py", line 31, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\core.py", line 137, in _execute
    return await future
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\core.py", line 110, in run
    result = function()
> sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "[隐私问题,已替换]\bot.py", line 59, in <module>
    nonebot.run(app="__mp_main__:app")
  File "D:\Path\Python311\Lib\site-packages\nonebot\__init__.py", line 273, in run
    get_driver().run(*args, **kwargs)
  File "D:\Path\Python311\Lib\site-packages\nonebot\drivers\fastapi.py", line 187, in run
    uvicorn.run(
  File "D:\Path\Python311\Lib\site-packages\uvicorn\main.py", line 569, in run
    server.run()
  File "D:\Path\Python311\Lib\site-packages\uvicorn\server.py", line 60, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "D:\Path\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
  File "D:\Path\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "D:\Path\Python311\Lib\asyncio\base_events.py", line 640, in run_until_complete
    self.run_forever()
  File "D:\Path\Python311\Lib\asyncio\windows_events.py", line 321, in run_forever
    super().run_forever()
  File "D:\Path\Python311\Lib\asyncio\base_events.py", line 607, in run_forever
    self._run_once()
  File "D:\Path\Python311\Lib\asyncio\base_events.py", line 1919, in _run_once
    handle._run()
  File "D:\Path\Python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "D:\Path\Python311\Lib\site-packages\nonebot\adapters\onebot\v11\bot.py", line 194, in handle_event
    await handle_event(self, event)
>> File "D:\Path\Python311\Lib\site-packages\nonebot\message.py", line 333, in handle_event
    await asyncio.gather(*coros)
  File "D:\Path\Python311\Lib\site-packages\nonebot\utils.py", line 157, in run_coro_with_catch
    return await coro
  File "D:\Path\Python311\Lib\site-packages\nonebot\dependencies\__init__.py", line 108, in __call__
    return await cast(Callable[..., Awaitable[R]], self.call)(**values)
  File "D:\Path\Python311\Lib\site-packages\nonebot_plugin_chatrecorder\__init__.py", line 48, in record_recv_msg_v11
    await session.commit()
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\ext\asyncio\session.py", line 583, in commit
    return await greenlet_spawn(self.sync_session.commit)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 128, in greenlet_spawn
    result = context.switch(value)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 1451, in commit
    self._transaction.commit(_to_root=self.future)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 829, in commit
    self._prepare_impl()
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 808, in _prepare_impl
    self.session.flush()
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 3386, in flush
    self._flush(objects)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 3525, in _flush
    with util.safe_reraise():
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\compat.py", line 208, in raise_
    raise exception
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\session.py", line 3486, in _flush
    flush_context.execute()
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\unitofwork.py", line 456, in execute
    rec.execute(self)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\persistence.py", line 245, in save_obj
    _emit_insert_statements(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\orm\persistence.py", line 1238, in _emit_insert_statements
    result = connection._execute_20(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 1705, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\sql\elements.py", line 333, in _execute_on_connection
    return connection._execute_clauseelement(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 1572, in _execute_clauseelement
    ret = self._execute_context(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 1943, in _execute_context
    self._handle_dbapi_exception(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 2124, in _handle_dbapi_exception
    util.raise_(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\compat.py", line 208, in raise_
    raise exception
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 100, in execute
    self._adapt_connection._handle_exception(error)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 228, in _handle_exception
    raise error
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\dialects\sqlite\aiosqlite.py", line 82, in execute
    self.await_(_cursor.execute(operation, parameters))
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 68, in await_only
    return current.driver.switch(awaitable)
  File "D:\Path\Python311\Lib\site-packages\sqlalchemy\util\_concurrency_py3k.py", line 121, in greenlet_spawn
    value = await result
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\cursor.py", line 37, in execute
    await self._execute(self._cursor.execute, sql, parameters)
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\cursor.py", line 31, in _execute
    return await self._conn._execute(fn, *args, **kwargs)
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\core.py", line 137, in _execute
    return await future
  File "D:\Path\Python311\Lib\site-packages\aiosqlite\core.py", line 110, in run
    result = function()
> sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO nonebot_plugin_chatrecorder_messagerecord (bot_type, bot_id, platform, time, type, detail_type, message_id, message, plain_text, user_id, group_id, guild_id, channel_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('OneBot V11', '[隐私问题,已替换]', 'qq', '2023-01-23 08:07:22.000000', 'message', 'group', '[隐私问题,已替换]', '[{"type": "image", "data": {"file": "314db052c1847c0b51794ce3eff22482.image", "subType": "0", "url": "[隐私问题,已替换]"}}]', '', '[隐私问题,已替换]', '[隐私问题,已替换]', None, None)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

我认为比较重要的信息已在所在行使用蓝色高亮。

我推测问题是数据库队列满了

he0119 commented 1 year ago

看来是 chatrecorder 的,我也不太清楚,sqlite 可能确实不太能承受高并发的场景吧(

我也没怎么用过 sqlalchemy+aiosqlite。

MSDNicrosoft commented 1 year ago

看来是 chatrecorder 的,我也不太清楚,sqlite 可能确实不太能承受高并发的场景吧(

我也没怎么用过 sqlalchemy+aiosqlite。

话说是不是 off-topic 了(

出现这种情况主要是因为我在开发过程中会出现影响 nonebot 本体的代码,而我本人水平比较菜(

另外,有合适的联系方式吗?在这聊真属于 off-topic 了

he0119 commented 1 year ago

看来是 chatrecorder 的,我也不太清楚,sqlite 可能确实不太能承受高并发的场景吧( 我也没怎么用过 sqlalchemy+aiosqlite。

~话说是不是 off-topic 了(~

出现这种情况主要是因为我在开发过程中会出现影响 nonebot 本体的代码,而我本人水平比较菜(

另外,有合适的联系方式吗?~在这聊真属于 off-topic 了~

嗯,你去 chatrecorder 那里提个 issue 吧。联系方式的话,我在 nonebot 技术交流群里的。

wei-z-git commented 1 year ago

不仅bot用户需要排除,有些表情比如/汪汪, /斜眼似乎也需要排除,

但是我觉得如果根据bot user_id排除的话,似乎没必要分群设置,因为是bot在哪里都应该被排除

我在思考是否需要一个数据库,分群进行设置。之前去写 ob12 支持了,所以一直没写这个。现在 datastore 支持迁移脚本,修改数据库也方便起来了。

he0119 commented 1 year ago

但是我觉得如果根据bot user_id排除的话,似乎没必要分群设置,因为是bot在哪里都应该被排除

这个好说,当 group_id 为空时则在所有群排除。

wei-z-git commented 1 year ago

另外stopword刚看了下jieba的大概懂了,关于屏蔽我的一点不靠谱的想法是

另外配置导入导出这个,如果用1.命令 /导出配置 2.导出一堆json在qq消息框,然后让用户copy 4. 使用/导入配置 粘贴json,不知道是否可行。。(这样好像就能近似实现配置服务端无状态了。。)

he0119 commented 1 year ago

0.4.8 提供了一个排除指定用户的配置。更复杂的版本等以后再来实现(

HuangArmagh commented 1 year ago

定时发送的词云依然会统计排除的用户,这是不是一个bug?

he0119 commented 1 year ago

定时发送的词云依然会统计排除的用户,这是不是一个bug?

还真是,忘记在那里排除了(