noneplugin / nonebot-plugin-chatrecorder

适用于 Nonebot2 的聊天记录插件
MIT License
54 stars 8 forks source link

迁移聊天记录数据的脚本占用内存与数据量相关,数据量大时候会直接out of memory #24

Closed LambdaYH closed 1 year ago

LambdaYH commented 1 year ago

假设50w条聊天记录。

he0119 commented 1 year ago

确实得一部分一部分迁移。不过你的数据库居然这么大的吗?虽然模型定义的时候给字符串长度比较大,一般都达不到吧😂。(我 25w 条聊天记录,整个数据库才 56MB)

LambdaYH commented 1 year ago

确实得一部分一部分迁移。不过你的数据库居然这么大的吗?虽然模型定义的时候给字符串长度比较大,一般都达不到吧😂。(我 25w 条聊天记录,整个数据库才 56MB)

我确实是看着变量随便算一下的XD,没有具体测过,但是我4G的服务器一启动迁移就直接炸了。刚刚看了一下我72w条数据有345m。

he0119 commented 1 year ago

试了一下,内存占用确实比较大。

0.4.1 版本的迁移脚本
Filename: C:\Users\hmy01\Works\Working\Bot\nonebot-plugin-chatrecorder\nonebot_plugin_chatrecorder\migrations\902a51ac4032_add_session.py

Line #    Mem usage    Increment  Occurrences   Line Contents
    22     85.6 MiB     85.6 MiB           1   @profile
    23                                         def upgrade() -> None:
    24                                             # ### commands auto generated by Alembic - please adjust! ###
    25     85.6 MiB      0.0 MiB           1       Base = automap_base()
    26     86.3 MiB      0.7 MiB           1       Base.prepare(autoload_with=op.get_bind())
    27     86.3 MiB      0.0 MiB           1       MessageRecord = Base.classes.nonebot_plugin_chatrecorder_messagerecord
    28     86.3 MiB      0.0 MiB           1       SessionModel = Base.classes.nonebot_plugin_session_sessionmodel
    29    942.9 MiB    -10.0 MiB           2       with Session(op.get_bind()) as session:
    30     86.3 MiB      0.0 MiB           1           logger.warning("正在迁移聊天记录数据,请不要关闭程序...")
    31
    32     86.3 MiB      0.0 MiB           1           statement = select(MessageRecord)
    33    784.3 MiB    698.0 MiB           1           messages = session.scalars(statement).all()
    34
    35    784.3 MiB      0.0 MiB           1           bulk_insert_sessions = {}
    36    784.3 MiB      0.0 MiB           1           message_id_session_key_map = {}
    37    808.8 MiB      0.0 MiB      246353           for message in messages:
    38    808.8 MiB      0.0 MiB      246352               bot_id = message.bot_id
    39    808.8 MiB      0.0 MiB      246352               bot_type = message.bot_type
    40    808.8 MiB      0.0 MiB      246352               platform = message.platform
    41    808.8 MiB      0.0 MiB      246352               level = "LEVEL0"
    42    808.8 MiB      0.0 MiB      246352               if message.detail_type == "private":
    43    808.1 MiB      0.0 MiB          74                   level = "LEVEL1"
    44    808.8 MiB      0.0 MiB      246278               elif message.detail_type == "group":
    45    808.8 MiB      0.0 MiB      245812                   level = "LEVEL2"
    46    808.8 MiB      0.0 MiB         466               elif message.detail_type == "channel":
    47    808.8 MiB      0.0 MiB         466                   level = "LEVEL3"
    48    808.8 MiB      0.0 MiB      246352               id1 = message.user_id
    49    808.8 MiB      0.0 MiB      246352               id2 = message.group_id or message.channel_id
    50    808.8 MiB      0.0 MiB      246352               id3 = message.guild_id
    51
    52    808.8 MiB     14.5 MiB      246352               session_key = (bot_id, bot_type, platform, level, id1, id2, id3)
    53                                                     # 保存 message id 和 session key 的对应关系
    54    808.8 MiB     10.0 MiB      246352               message_id_session_key_map[message.id] = session_key
    55    808.8 MiB      0.0 MiB      246352               if session_key not in bulk_insert_sessions:
    56    808.8 MiB      0.0 MiB          62                   bulk_insert_sessions[session_key] = {
    57    808.8 MiB      0.0 MiB          62                       "bot_id": bot_id,
    58    808.8 MiB      0.0 MiB          62                       "bot_type": bot_type,
    59    808.8 MiB      0.0 MiB          62                       "platform": platform,
    60    808.8 MiB      0.0 MiB          62                       "level": level,
    61    808.8 MiB      0.0 MiB          62                       "id1": id1,
    62    808.8 MiB      0.0 MiB          62                       "id2": id2,
    63    808.8 MiB      0.0 MiB          62                       "id3": id3,
    64                                                         }
    65
    66    808.8 MiB      0.0 MiB           1           session_key_id_map = {}
    67    808.8 MiB      0.0 MiB           1           if bulk_insert_sessions:
    68                                                     # 读取已经存在的 session
    69    808.8 MiB      0.0 MiB           1               for session_obj in session.scalars(select(SessionModel)).all():
    70                                                         session_key = (
    71                                                             session_obj.bot_id,
    72                                                             session_obj.bot_type,
    73                                                             session_obj.platform,
    74                                                             session_obj.level,
    75                                                             session_obj.id1,
    76                                                             session_obj.id2,
    77                                                             session_obj.id3,
    78                                                         )
    79                                                         session_key_id_map[session_key] = session_obj.id
    80
    81                                                     # 更新新插入的 session
    82    808.9 MiB      0.1 MiB           2               session.execute(
    83    808.8 MiB      0.0 MiB           1                   insert(SessionModel),
    84    808.8 MiB      0.0 MiB          66                   [
    85    808.8 MiB      0.0 MiB          62                       session_dict
    86    808.8 MiB      0.0 MiB          63                       for key, session_dict in bulk_insert_sessions.items()
    87    808.8 MiB      0.0 MiB         125                       if key not in session_key_id_map  # 去重
    88                                                         ],
    89                                                     )
    90    808.9 MiB      0.0 MiB          63               for session_obj in session.scalars(select(SessionModel)).all():
    91    808.9 MiB      0.0 MiB          62                   session_key = (
    92    808.9 MiB      0.0 MiB          62                       session_obj.bot_id,
    93    808.9 MiB      0.0 MiB          62                       session_obj.bot_type,
    94    808.9 MiB      0.0 MiB          62                       session_obj.platform,
    95    808.9 MiB      0.0 MiB          62                       session_obj.level,
    96    808.9 MiB      0.0 MiB          62                       session_obj.id1,
    97    808.9 MiB      0.0 MiB          62                       session_obj.id2,
    98    808.9 MiB      0.0 MiB          62                       session_obj.id3,
    99                                                         )
   100    808.9 MiB      0.0 MiB          62                   session_key_id_map[session_key] = session_obj.id
   101
   102                                                 # 更新 message 的 session id
   103    808.9 MiB      0.0 MiB           1           bulk_update_messages = []
   104    841.1 MiB    -31.7 MiB      246353           for message_id, session_key in message_id_session_key_map.items():
   105    841.1 MiB    -61.4 MiB      492704               bulk_update_messages.append(
   106    841.1 MiB     18.4 MiB      246352                   {
   107    841.1 MiB    -31.7 MiB      246352                       "id": message_id,
   108    841.1 MiB    -11.9 MiB      246352                       "session_id": session_key_id_map[session_key],
   109                                                         }
   110                                                     )
   111    841.1 MiB      0.0 MiB           1           if bulk_update_messages:
   112    952.9 MiB    111.7 MiB           1               session.execute(update(MessageRecord), bulk_update_messages)
   113    952.9 MiB      0.0 MiB           1           logger.warning("聊天记录数据迁移完成!")
   114
   115    943.1 MiB      0.2 MiB           3       with op.batch_alter_table(
   116    942.9 MiB      0.0 MiB           1           "nonebot_plugin_chatrecorder_messagerecord", schema=None
   117    942.9 MiB      0.0 MiB           1       ) as batch_op:
   118    942.9 MiB      0.0 MiB           1           batch_op.alter_column("session_id", existing_type=sa.Integer(), nullable=False)
   119
   120                                             # ### end Alembic commands ###
尝试每次只处理 1w 条记录
Filename: C:\Users\hmy01\Works\Working\Bot\nonebot-plugin-chatrecorder\nonebot_plugin_chatrecorder\migrations\902a51ac4032_add_session.py

Line #    Mem usage    Increment  Occurrences   Line Contents
    23     85.7 MiB     85.7 MiB           1   @profile
    24                                         def upgrade() -> None:
    25                                             # ### commands auto generated by Alembic - please adjust! ###
    26     85.7 MiB      0.0 MiB           1       Base = automap_base()
    27     86.6 MiB      0.9 MiB           1       Base.prepare(autoload_with=op.get_bind())
    28     86.6 MiB      0.0 MiB           1       MessageRecord = Base.classes.nonebot_plugin_chatrecorder_messagerecord
    29     86.6 MiB      0.0 MiB           1       SessionModel = Base.classes.nonebot_plugin_session_sessionmodel
    30    150.2 MiB     -1.3 MiB           2       with Session(op.get_bind()) as session:
    31     86.6 MiB      0.0 MiB           1           logger.warning("正在迁移聊天记录数据,请不要关闭程序...")
    32
    33     86.6 MiB      0.0 MiB           1           session_key_id_map = {}
    34     87.5 MiB      0.9 MiB           1           count = session.query(MessageRecord).count()
    35    151.5 MiB     -0.5 MiB          26           for i in range(count // 10000 + 1):
    36    151.5 MiB     -0.4 MiB          25               statement = select(MessageRecord).limit(10000).offset(i * 10000)
    37    151.5 MiB     55.2 MiB          25               messages = session.scalars(statement).all()
    38
    39    151.5 MiB     -0.4 MiB          25               bulk_insert_sessions = {}
    40    151.5 MiB     -0.4 MiB          25               message_id_session_key_map = {}
    41    151.5 MiB  -4119.6 MiB      246377               for message in messages:
    42    151.5 MiB  -4119.2 MiB      246352                   bot_id = message.bot_id
    43    151.5 MiB  -4119.2 MiB      246352                   bot_type = message.bot_type
    44    151.5 MiB  -4119.2 MiB      246352                   platform = message.platform
    45    151.5 MiB  -4119.2 MiB      246352                   level = "LEVEL0"
    46    151.5 MiB  -4119.2 MiB      246352                   if message.detail_type == "private":
    47    151.5 MiB     -1.0 MiB          74                       level = "LEVEL1"
    48    151.5 MiB  -4118.2 MiB      246278                   elif message.detail_type == "group":
    49    151.5 MiB  -4090.9 MiB      245812                       level = "LEVEL2"
    50    151.5 MiB    -27.3 MiB         466                   elif message.detail_type == "channel":
    51    151.5 MiB      0.0 MiB         466                       level = "LEVEL3"
    52    151.5 MiB  -4091.9 MiB      246352                   id1 = message.user_id
    53    151.5 MiB  -4119.2 MiB      246352                   id2 = message.group_id or message.channel_id
    54    151.5 MiB  -4119.2 MiB      246352                   id3 = message.guild_id
    55
    56    151.5 MiB  -4118.7 MiB      246352                   session_key = (bot_id, bot_type, platform, level, id1, id2, id3)
    57                                                         # 保存 message id 和 session key 的对应关系
    58    151.5 MiB  -4118.7 MiB      246352                   message_id_session_key_map[message.id] = session_key
    59    151.5 MiB  -4119.2 MiB      246352                   if session_key not in bulk_insert_sessions:
    60    151.5 MiB    -12.7 MiB         699                       bulk_insert_sessions[session_key] = {
    61    151.5 MiB    -12.7 MiB         699                           "bot_id": bot_id,
    62    151.5 MiB    -12.7 MiB         699                           "bot_type": bot_type,
    63    151.5 MiB    -12.7 MiB         699                           "platform": platform,
    64    151.5 MiB    -12.7 MiB         699                           "level": level,
    65    151.5 MiB    -12.7 MiB         699                           "id1": id1,
    66    151.5 MiB    -12.7 MiB         699                           "id2": id2,
    67    151.5 MiB    -12.7 MiB         699                           "id3": id3,
    68                                                             }
    69
    70    151.5 MiB     -0.5 MiB          25               if bulk_insert_sessions:
    71                                                         # 读取已经存在的 session
    72    151.5 MiB    -22.2 MiB        1017                   for session_obj in session.scalars(select(SessionModel)).all():
    73    151.5 MiB    -21.8 MiB         992                       session_key = (
    74    151.5 MiB    -21.8 MiB         992                           session_obj.bot_id,
    75    151.5 MiB    -21.8 MiB         992                           session_obj.bot_type,
    76    151.5 MiB    -21.8 MiB         992                           session_obj.platform,
    77    151.5 MiB    -21.8 MiB         992                           session_obj.level,
    78    151.5 MiB    -21.8 MiB         992                           session_obj.id1,
    79    151.5 MiB    -21.8 MiB         992                           session_obj.id2,
    80    151.5 MiB    -21.8 MiB         992                           session_obj.id3,
    81                                                             )
    82    151.5 MiB    -21.8 MiB         992                       session_key_id_map[session_key] = session_obj.id
    83
    84                                                         # 更新新插入的 session
    85    151.5 MiB    -15.9 MiB         799                   new_sessions_dict = [
    86    151.5 MiB     -1.1 MiB          62                       session_dict
    87    151.5 MiB    -14.5 MiB         724                       for key, session_dict in bulk_insert_sessions.items()
    88    151.5 MiB    -15.5 MiB         786                       if key not in session_key_id_map  # 去重
    89                                                         ]
    90    151.5 MiB     -0.5 MiB          25                   if new_sessions_dict:
    91    151.5 MiB     -0.5 MiB          24                       session.execute(
    92    151.5 MiB     -0.2 MiB          12                           insert(SessionModel),
    93    151.5 MiB     -0.2 MiB          12                           new_sessions_dict,
    94                                                             )
    95    151.5 MiB    -23.3 MiB        1079                   for session_obj in session.scalars(select(SessionModel)).all():
    96    151.5 MiB    -22.9 MiB        1054                       session_key = (
    97    151.5 MiB    -22.9 MiB        1054                           session_obj.bot_id,
    98    151.5 MiB    -22.9 MiB        1054                           session_obj.bot_type,
    99    151.5 MiB    -22.9 MiB        1054                           session_obj.platform,
   100    151.5 MiB    -22.9 MiB        1054                           session_obj.level,
   101    151.5 MiB    -22.9 MiB        1054                           session_obj.id1,
   102    151.5 MiB    -22.9 MiB        1054                           session_obj.id2,
   103    151.5 MiB    -22.9 MiB        1054                           session_obj.id3,
   104                                                             )
   105    151.5 MiB    -22.9 MiB        1054                       session_key_id_map[session_key] = session_obj.id
   106
   107                                                     # 更新 message 的 session id
   108    151.5 MiB     -1.5 MiB          25               bulk_update_messages = []
   109    151.5 MiB  -4474.2 MiB      246377               for message_id, session_key in message_id_session_key_map.items():
   110    151.5 MiB  -8947.5 MiB      492704                   bulk_update_messages.append(
   111    151.5 MiB  -4472.5 MiB      246352                       {
   112    151.5 MiB  -4473.8 MiB      246352                           "id": message_id,
   113    151.5 MiB  -4473.7 MiB      246352                           "session_id": session_key_id_map[session_key],
   114                                                             }
   115                                                         )
   116    151.5 MiB     -0.5 MiB          25               if bulk_update_messages:
   117    151.5 MiB      6.5 MiB          25                   session.execute(update(MessageRecord), bulk_update_messages)
   118    151.5 MiB     -0.1 MiB           1           logger.warning("聊天记录数据迁移完成!")
   119
   120    150.2 MiB      0.0 MiB           3       with op.batch_alter_table(
   121    150.2 MiB      0.0 MiB           1           "nonebot_plugin_chatrecorder_messagerecord", schema=None
   122    150.2 MiB      0.0 MiB           1       ) as batch_op:
   123    150.2 MiB      0.0 MiB           1           batch_op.alter_column("session_id", existing_type=sa.Integer(), nullable=False)
   124
   125                                             # ### end Alembic commands ###
尝试每次只处理 1w 条记录,并且仅获取所需字段

Filename: C:\Users\hmy01\Works\Working\Bot\nonebot-plugin-chatrecorder\nonebot_plugin_chatrecorder\migrations\902a51ac4032_add_session.py

Line #    Mem usage    Increment  Occurrences   Line Contents
    23     82.5 MiB     82.5 MiB           1   @profile
    24                                         def upgrade() -> None:
    25                                             # ### commands auto generated by Alembic - please adjust! ###
    26     82.5 MiB      0.0 MiB           1       Base = automap_base()
    27     83.4 MiB      0.9 MiB           1       Base.prepare(autoload_with=op.get_bind())
    28     83.4 MiB      0.0 MiB           1       MessageRecord = Base.classes.nonebot_plugin_chatrecorder_messagerecord
    29     83.4 MiB      0.0 MiB           1       SessionModel = Base.classes.nonebot_plugin_session_sessionmodel
    30    105.5 MiB      0.0 MiB           2       with Session(op.get_bind()) as session:
    31     83.4 MiB      0.0 MiB           1           logger.warning("正在迁移聊天记录数据,请不要关闭程序...")
    32
    33     83.4 MiB      0.0 MiB           1           session_key_id_map = {}
    34     86.4 MiB      2.9 MiB           1           count = session.query(MessageRecord).count()
    35    105.5 MiB     -0.2 MiB          26           for i in range(count // 10000 + 1):
    36    105.5 MiB     -0.2 MiB          25               statement = (
    37    105.5 MiB     -0.3 MiB          50                   select(
    38    105.5 MiB     -0.2 MiB          25                       MessageRecord.bot_id,
    39    105.5 MiB     -0.2 MiB          25                       MessageRecord.bot_type,
    40    105.5 MiB     -0.2 MiB          25                       MessageRecord.platform,
    41    105.5 MiB     -0.2 MiB          25                       MessageRecord.detail_type,
    42    105.5 MiB     -0.2 MiB          25                       MessageRecord.user_id,
    43    105.5 MiB     -0.2 MiB          25                       MessageRecord.group_id,
    44    105.5 MiB     -0.2 MiB          25                       MessageRecord.channel_id,
    45    105.5 MiB     -0.2 MiB          25                       MessageRecord.guild_id,
    46    105.5 MiB     -0.2 MiB          25                       MessageRecord.id,
    47                                                         )
    48    105.5 MiB     -0.2 MiB          25                   .limit(10000)
    49    105.5 MiB     -0.2 MiB          25                   .offset(i * 10000)
    50                                                     )
    51    105.5 MiB      9.9 MiB          25               messages = session.execute(statement).all()
    52
    53    105.5 MiB     -0.2 MiB          25               bulk_insert_sessions = {}
    54    105.5 MiB     -0.2 MiB          25               message_id_session_key_map = {}
    55    105.5 MiB  -1481.4 MiB      246377               for message in messages:
    56    105.5 MiB  -1480.2 MiB      246352                   bot_id = message[0]
    57    105.5 MiB  -1480.2 MiB      246352                   bot_type = message[1]
    58    105.5 MiB  -1480.2 MiB      246352                   platform = message[2]
    59    105.5 MiB  -1480.2 MiB      246352                   level = "LEVEL0"
    60    105.5 MiB  -1480.2 MiB      246352                   if message[3] == "private":
    61    105.5 MiB     -0.6 MiB          74                       level = "LEVEL1"
    62    105.5 MiB  -1479.6 MiB      246278                   elif message[3] == "group":
    63    105.5 MiB  -1454.1 MiB      245812                       level = "LEVEL2"
    64    105.5 MiB    -25.5 MiB         466                   elif message[3] == "channel":
    65    105.5 MiB      0.0 MiB         466                       level = "LEVEL3"
    66    105.5 MiB  -1454.7 MiB      246352                   id1 = message[4]
    67    105.5 MiB  -1480.2 MiB      246352                   id2 = message[5] or message[6]
    68    105.5 MiB  -1480.2 MiB      246352                   id3 = message[7]
    69
    70    105.5 MiB  -1479.7 MiB      246352                   session_key = (bot_id, bot_type, platform, level, id1, id2, id3)
    71                                                         # 保存 message id 和 session key 的对应关系
    72    105.5 MiB  -1479.7 MiB      246352                   message_id_session_key_map[message[8]] = session_key
    73    105.5 MiB  -1480.2 MiB      246352                   if session_key not in bulk_insert_sessions:
    74    105.5 MiB     -5.7 MiB         699                       bulk_insert_sessions[session_key] = {
    75    105.5 MiB     -5.7 MiB         699                           "bot_id": bot_id,
    76    105.5 MiB     -5.7 MiB         699                           "bot_type": bot_type,
    77    105.5 MiB     -5.7 MiB         699                           "platform": platform,
    78    105.5 MiB     -5.7 MiB         699                           "level": level,
    79    105.5 MiB     -5.7 MiB         699                           "id1": id1,
    80    105.5 MiB     -5.7 MiB         699                           "id2": id2,
    81    105.5 MiB     -5.7 MiB         699                           "id3": id3,
    82                                                             }
    83
    84    105.5 MiB     -0.2 MiB          25               if bulk_insert_sessions:
    85                                                         # 读取已经存在的 session
    86    105.5 MiB     -8.4 MiB        1017                   for session_obj in session.scalars(select(SessionModel)).all():
    87    105.5 MiB     -8.2 MiB         992                       session_key = (
    88    105.5 MiB     -8.2 MiB         992                           session_obj.bot_id,
    89    105.5 MiB     -8.2 MiB         992                           session_obj.bot_type,
    90    105.5 MiB     -8.2 MiB         992                           session_obj.platform,
    91    105.5 MiB     -8.2 MiB         992                           session_obj.level,
    92    105.5 MiB     -8.2 MiB         992                           session_obj.id1,
    93    105.5 MiB     -8.2 MiB         992                           session_obj.id2,
    94    105.5 MiB     -8.2 MiB         992                           session_obj.id3,
    95                                                             )
    96    105.5 MiB     -8.2 MiB         992                       session_key_id_map[session_key] = session_obj.id
    97
    98                                                         # 更新新插入的 session
    99    105.5 MiB     -6.3 MiB         799                   new_sessions_dict = [
   100    105.5 MiB     -0.9 MiB          62                       session_dict
   101    105.5 MiB     -5.8 MiB         724                       for key, session_dict in bulk_insert_sessions.items()
   102    105.5 MiB     -6.6 MiB         786                       if key not in session_key_id_map  # 去重
   103                                                         ]
   104    105.5 MiB     -0.2 MiB          25                   if new_sessions_dict:
   105    105.5 MiB     -0.1 MiB          24                       session.execute(
   106    105.5 MiB     -0.1 MiB          12                           insert(SessionModel),
   107    105.5 MiB     -0.1 MiB          12                           new_sessions_dict,
   108                                                             )
   109    105.5 MiB     -9.0 MiB        1079                   for session_obj in session.scalars(select(SessionModel)).all():
   110    105.5 MiB     -8.9 MiB        1054                       session_key = (
   111    105.5 MiB     -8.9 MiB        1054                           session_obj.bot_id,
   112    105.5 MiB     -8.9 MiB        1054                           session_obj.bot_type,
   113    105.5 MiB     -8.9 MiB        1054                           session_obj.platform,
   114    105.5 MiB     -8.9 MiB        1054                           session_obj.level,
   115    105.5 MiB     -8.9 MiB        1054                           session_obj.id1,
   116    105.5 MiB     -8.9 MiB        1054                           session_obj.id2,
   117    105.5 MiB     -8.9 MiB        1054                           session_obj.id3,
   118                                                             )
   119    105.5 MiB     -8.9 MiB        1054                       session_key_id_map[session_key] = session_obj.id
   120
   121                                                     # 更新 message 的 session id
   122    105.5 MiB     -3.2 MiB          25               bulk_update_messages = []
   123    105.5 MiB  -1416.5 MiB      246377               for message_id, session_key in message_id_session_key_map.items():
   124    105.5 MiB  -2832.6 MiB      492704                   bulk_update_messages.append(
   125    105.5 MiB  -1414.6 MiB      246352                       {
   126    105.5 MiB  -1416.3 MiB      246352                           "id": message_id,
   127    105.5 MiB  -1416.3 MiB      246352                           "session_id": session_key_id_map[session_key],
   128                                                             }
   129                                                         )
   130    105.5 MiB     -0.2 MiB          25               if bulk_update_messages:
   131    105.5 MiB      9.9 MiB          25                   session.execute(update(MessageRecord), bulk_update_messages)
   132    105.5 MiB     -0.1 MiB           1           logger.warning("聊天记录数据迁移完成!")
   133
   134    105.5 MiB      0.1 MiB           3       with op.batch_alter_table(
   135    105.5 MiB      0.0 MiB           1           "nonebot_plugin_chatrecorder_messagerecord", schema=None
   136    105.5 MiB      0.0 MiB           1       ) as batch_op:
   137    105.5 MiB      0.0 MiB           1           batch_op.alter_column("session_id", existing_type=sa.Integer(), nullable=False)
   138
   139                                             # ### end Alembic commands ###
he0119 commented 1 year ago

试试这个分支?https://github.com/noneplugin/nonebot-plugin-chatrecorder/tree/fix/memory

你也可以自己调整一下每次迁移多少数据。

LambdaYH commented 1 year ago

试试这个分支?https://github.com/noneplugin/nonebot-plugin-chatrecorder/tree/fix/memory

你也可以自己调整一下每次迁移多少数据。

这个分支的内存占用是降低了很多,但是最后报错了。

08-27 13:16:28 [WARNING] 902a51ac4032_add_session_py | 正在迁移聊天记录数据,请不要关闭程序...
08-27 13:18:59 [WARNING] 902a51ac4032_add_session_py | 聊天记录数据迁移完成!
08-27 13:18:59 [ERROR] uvicorn | Traceback (most recent call last):
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 552, in _prepare_and_execute
    self._rows = await prepared_stmt.fetch(*parameters)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/asyncpg/prepared_stmt.py", line 176, in fetch
    data = await self.__bind_execute(args, 0, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/asyncpg/prepared_stmt.py", line 241, in __bind_execute
    data, status, _ = await self.__do_execute(
                      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/asyncpg/prepared_stmt.py", line 230, in __do_execute
    return await executor(protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "asyncpg/protocol/protocol.pyx", line 201, in bind_execute
asyncpg.exceptions.NotNullViolationError: column "session_id" of relation "nonebot_plugin_chatrecorder_messagerecord" contains null values

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.dialects.postgresql.asyncpg.AsyncAdapt_asyncpg_dbapi.IntegrityError: <class 'asyncpg.exceptions.NotNullViolationError'>: column "session_id" of relation "nonebot_plugin_chatrecorder_messagerecord" contains null values

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/migangbot/.venv/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.11/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot/drivers/fastapi.py", line 166, in _lifespan_manager
    await self._lifespan.startup()
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot/drivers/_lifespan.py", line 36, in startup
    await self._run_lifespan_func(self._startup_funcs)
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot/drivers/_lifespan.py", line 30, in _run_lifespan_func
    await cast(ASYNC_LIFESPAN_FUNC, func)()
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot_plugin_datastore/db.py", line 104, in init_db
    await upgrade(config, "head")
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot_plugin_datastore/script/command.py", line 245, in upgrade
    await run_migration()
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot_plugin_datastore/script/utils.py", line 123, in run_migration
    await connection.run_sync(do_run_migrations, plugin_name)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/engine.py", line 884, in run_sync
    return await greenlet_spawn(fn, self._proxied, *arg, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 192, in greenlet_spawn
    result = context.switch(value)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot_plugin_datastore/script/utils.py", line 115, in do_run_migrations
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/runtime/environment.py", line 922, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/runtime/migration.py", line 624, in run_migrations
    step.migration_fn(**kw)
  File "/migangbot/.venv/lib/python3.11/site-packages/nonebot_plugin_chatrecorder/migrations/902a51ac4032_add_session.py", line 133, in upgrade
    with op.batch_alter_table(
  File "/usr/local/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/operations/base.py", line 375, in batch_alter_table
    impl.flush()
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/operations/batch.py", line 113, in flush
    fn(*arg, **kw)
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/ddl/postgresql.py", line 184, in alter_column
    super().alter_column(
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/ddl/impl.py", line 229, in alter_column
    self._exec(
  File "/migangbot/.venv/lib/python3.11/site-packages/alembic/ddl/impl.py", line 193, in _exec
    return conn.execute(  # type: ignore[call-overload]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
           ^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/sql/ddl.py", line 181, in _execute_on_connection
    return connection._execute_ddl(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1524, in _execute_ddl
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 585, in execute
    self._adapt_connection.await_(
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 125, in await_only
    return current.driver.switch(awaitable)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 185, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 564, in _prepare_and_execute
    self._handle_exception(error)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 515, in _handle_exception
    self._adapt_connection._handle_exception(error)
  File "/migangbot/.venv/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 802, in _handle_exception
    raise translated_error from error
sqlalchemy.exc.IntegrityError: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.NotNullViolationError'>: column "session_id" of relation "nonebot_plugin_chatrecorder_messagerecord" contains null values
[SQL: ALTER TABLE nonebot_plugin_chatrecorder_messagerecord ALTER COLUMN session_id SET NOT NULL]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
he0119 commented 1 year ago

为什么有记录没有设置到 session_id 😭

你方便打开数据库看看具体数据吗?(对了,你要不加群来看看?)

he0119 commented 1 year ago

试试这个分支 https://github.com/noneplugin/nonebot-plugin-chatrecorder/tree/fix/memory 最新的 commit 呢?

LambdaYH commented 1 year ago

试试这个分支 https://github.com/noneplugin/nonebot-plugin-chatrecorder/tree/fix/memory 最新的 commit 呢?

这个又把内存撑爆了,但是能够迁移成功,先前那个commit我加了2行

            for message_id, session_key in message_id_session_key_map.items():
                bulk_update_messages.append(
                    {
                        "id": message_id,
                        "session_id": session_key_id_map[session_key],
                    }
                )
+                if session_key_id_map[session_key] is None:
+                   logger.error(message_id)
            if bulk_update_messages:
                session.execute(update(MessageRecord), bulk_update_messages)

好像也没有发现有None的(是哪个群

he0119 commented 1 year ago

点击链接加入群聊【插 件 售 后】:http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=gvrQ2aDaIp3zu9HedsDzyPRSrhCK_Ggk&authKey=pmOjqRD4NpYYHgeQxJKm7JShoOsXQjbGvzwNo81%2FPMFO3h%2FLyLxoJPMRhbCC%2Bo3s&noverify=0&group_code=730374631