Closed jchristgit closed 1 day ago
Looks like it's currently written to set all to not in the guild, then set those in the guild to true after https://github.com/python-discord/metricity/blob/e68d83de9bb3027085c4f72e3da4699eefdc1546/metricity/exts/event_listeners/startup_sync.py#L39
ahhh, I missed that.
Could we maybe do the whole update operation in a transaction or would that bust some limits with regards to PostgreSQL's transaction memory?
Maybe alternatively we could collect the IDs of all users where we set
it to true and then do a single (batch of) updates after the initial
sync to set in_guild=False
for anyone who's not in that set.
https://github.com/python-discord/metricity/pull/159 seems like a way we could fix this. I haven't been able to test that postgres is ok about serialising that many strings in a single query though.
sqlalchemy.exc.InterfaceError: (sqlalchemy.dialects.postgresql.asyncpg.InterfaceError) <class 'asyncpg.exceptions._base.InterfaceError'>: the number of query arguments cannot exceed 32767
will need to relook at this approach
My only thoughts on how to make the above PR work would be to create a TEMPORARY
table with ON COMMIT DROP
and somehow bulk insert all in-guild users into it.
Can't look into how to achieve that with SQLAlchemy right now though.
It might be easiest to select all in guild users from the database and then do db_in_guild - guild.members
and mark them as not in guild with another query.
It's not quite as nice as doing it all on the database but in practice I don't think it makes a difference for our case.
It might be easiest to select all in guild users from the database and then do
db_in_guild - guild.members
and mark them as not in guild with another query.
Won't this run into the same parameter issue?
I have an alternative idea, although it might be a bit crazy:
First we sort the list of user IDs to update ascending. Then we move the data in chunks, with special handling for the first and last chunk to incorporate an update for anything outside our known range as well.
Pseudocode (unsure of the exact SQLAlchemy syntax):
async with async_session() as sess:
total = len(in_guild_user_id_chunks)
for idx, chunk in enumerate(in_guild_user_id_chunks):
not_user_in_chunk = ~models.User.id.in_(chunk)
user_below_upper_bound = models.User.id <= max(chunk)
user_above_lower_bound = models.User.id >= min(chunk)
if idx == 0:
# Update the complete lower end we are aware of
condition = (user_below_upper_bound & not_user_in_chunk)
elif idx == total - 1:
# Update the complete upper end we are aware of
condition = (user_above_lower_bound & not_user_in_chunk)
else:
# Just update this batch
condition = (user_below_upper_bound & user_above_lower_bound & not_user_in_chunk)
await sess.execute(update(models.User).where(condition).values(in_guild=False))
await sess.commit()
Things to consider:
in_guild
value might be incorrectly set to False
in_guild
value won't be set to
False
, which is a no-op.My suggestion was to do the comparison against guild.members
on the Python side rather than the database site, so basically what we currently do but only change the users we need to. Something like this:
async with async_session() as sess:
in_guild_users = await sess.execute(select(models.User).filter_by(in_guild=True)).scalars().all()
guild_member_ids = {member.id for member in guild.members}
for user in in_guild_users:
if user.id not in guild_member_ids:
user.in_guild = False
await sess.commit()
Can confirm it works as expected now, no more dips are visible in statistics.
Thanks for the fix!
When metricity starts up and synchronizes the users, all users temporarily have their
in_guild
field set tofalse
.From looking at the code I am currently unsure how this can happen. As far as I understand, the complete upsert is run in a transaction that is committed at the end of each individual insert.
Demonstration as taken during a metricity restart (whilst
User upsert:
messages can be seen in the logs):