Closed Sirfrancis18 closed 3 years ago
Same here. I have a chat dialog containing 39,000 messages. I've changed the following line so that search_limit
is 10000 and delete_chunk_size
is 1000, yet it still only finds and deletes 100 messages.
It's worth noting that I'm trying to delete messages from a bot owned group as seen here: https://github.com/gurland/telegram-delete-all-messages/issues/30#issuecomment-796623934
@hannesveit I see you were able to solve some search limitations with groups and supergroups. Do you think there may be a similar issue with type: bots?
I'm no Python programmer so don't know how to fix this. I've been trying to debug the code and can see that I'm passing a value of 1000 within the limit parameter: https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L110 However, when I inspect the q variable at the point shown below, messages only return 100 items:
I've no idea how I can fix this. :(
Hmm weird. So it looks like it's ignoring the 1000 search limit and just using 100 instead? I can look into it once I find some time...
In the meantime, can you try setting the search_limit
to 100 here? It sounds like it terminates in the first iteration because the search returns only 100 messages (which is < search_limit
). Let me know if that fixes it.
Now that was interesting. Changing the search limit to 100 resulted in the following that I had to manually exit from. I'll just post the results for now and will debug to see why it keeps increasing the offset.
Searching messages. OFFSET: 10
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 110
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 210
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 310
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 410
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 510
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 610
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 710
@jarrah31 no need to manually exit from that! That's actually what it's supposed to look like (almost, see below). It's searching through your messages (in chunks of search_limit == 100
, each time increasing the offset to fetch the next batch) until it has them all. Once it has all messages, it's going to delete them in chunks of 100.
The one thing that looks weird is that it's starting with an offset of 10 instead of 0. I'm assuming this is something you changed also in the code? Please revert that change and try again.
Maybe those search status updates could be improved to just say something like "Fetched 700 / 39000" messages or so. Also, if changing the search_limit
back to 100 fixes your issue that's weird and I don't understand it, but maybe until we figure out why it's ignoring search limits > 100, we could just set the default search_limit
back to 100 to repair the script for now (I have definitely tested smaller ones, like 10, and they behaved as expected; I thought I had also tried to delete > 100 messages with a search limit of 1000 and confirmed that it worked in a single search iteration, but maybe I didn't...).
(Apologies, I didn't see your other comments whilst writing this)
From what I can see, it will repeat this while loop 100 messages at a time (having set search_limit
to 100), until there are no more messages to find:
while True:
q = self.search_messages(peer, add_offset)
message_ids.extend(msg.id for msg in q['messages'])
messages_count = len(q['messages'])
print(f'Found {messages_count} of your messages in "{chat.username}"')
if messages_count < self.search_limit:
break
add_offset += self.search_limit
I can see that message_ids
increases by 100 new messages on each loop, and I've verified that message timestamps within each new search (q variable) shows that it keeps finding older messages due to add_offset += self.search_limit
. I guess this would keep building up and eventually pass a list of messages to the delete_messages
function.
First of all I think the messages_count
needs to count the number of items within message_ids
. @hannesveit if you could help here please it would be really appreciated - I just don't know Python without lots of googling, but I can usually walk through code to debug it.
Once messages_count is accurate, a search_limit
would be a more practical. It would then break out of the loop after say 1000 messages have been found and then goes ahead to delete them.
Given that we seem to have a hard limit of 100 messages when calling app.send(Search(
, we should probably hard code those limits into the script. In the above loop, add_offset += self.search_limit
would become add_offset += 100
, and the following line:
https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L110
Would be limit=100,
(if direct integers actually work)
It still doesn't seem right that we are limited to 100 searches when there's an option within the search call to specify the limit. 100 seems too small! I need to do more testing and post an issue on Pyrogram GitHub.
The one thing that looks weird is that it's starting with an offset of 10 instead of 0. I'm assuming this is something you changed also in the code? Please revert that change and try again.
Yes that was my doing - I want to keep the last 10 messages. :)
Yes that was my doing - I want to keep the last 10 messages. :)
OK. Note that IIRC this will keep only your 10 oldest messages.
First of all I think the
messages_count
needs to count the number of items withinmessage_ids
.Once messages_count is accurate
Not sure I understand what you mean by these comments. How is messages_count
not accurate? It is the number of messages returned by the current search call. If we ignore the real bug for a second (that apparently the search seems to ignore limit
> 100), then in each iteration, either messages_count == self.search_limit
(so it managed to fetch another full batch of search_limit
messages and will do another iteration to see if there's more) or messages_count < self.search_limit
, in which case we know that we're done searching, because there were fewer than search_limit
messages left.
I see only one real issue here: the pyrogram search seems to ignore limit
values > 100. As you suggested, if this is really a hard limit (I couldn't find it in their documentation though), we might want to hardcode this value in the search call (and add a comment explaining why, so the next contributor doesn't undo it!)
OK. Note that IIRC this will keep only your 10 oldest messages.
It looks like it keeps the 10 latest or newest messages. I've just double checked by pausing after the first search. The first message datestamp is indeed the 10th latest message, and the last one is from yesterday. Thanks for making me check though as you never know with these things. :)
First of all I think the
messages_count
needs to count the number of items withinmessage_ids
. Once messages_count is accurateNot sure I understand what you mean by these comments. How is
messages_count
not accurate? It is the number of messages returned by the current search call. If we ignore the real bug for a second (that apparently the search seems to ignorelimit
> 100), then in each iteration, eithermessages_count == self.search_limit
(so it managed to fetch another full batch ofsearch_limit
messages and will do another iteration to see if there's more) ormessages_count < self.search_limit
, in which case we know that we're done searching, because there were fewer thansearch_limit
messages left.I see only one real issue here: the pyrogram search seems to ignore
limit
values > 100. As you suggested, if this is really a hard limit (I couldn't find it in their documentation though), we might want to hardcode this value in the search call (and add a comment explaining why, so the next contributor doesn't undo it!)
I see what you mean, as you said it's in there for when messages_count < self.search_limit
which is a valid use case. I guess I was thinking for unusual situations like mine where I have 39,000 messages (it's a bot used to report camera, motion, and door sensor activity, so it can be quite chatty). I'd like to specify a limit on how many it deletes on each script run, say 1000, just to make sure it's working and doesn't break the script by having too many messages to process.
Regarding messages_count
, I was thinking along the same lines as your earlier suggestion to indicate current progress: "Fetched 700 / 39000".
Yeah, search_limit
is actually an unfortunate name for this attribute. A better name would be search_chunk_size
or something like that. It is passed as the limit
to the search, so in that sense it's a "limit" in one single search iteration, but in terms of your entire search, it specifies the chunk size rather than a "limit"
What you suggest re: specifying a global limit on the total number of messages to delete sounds reasonable as well. Also making the initial offset configurable in the constructor, to support your other use case where you want to keep a certain number of recent messages sounds like a good idea.
I'm happy to work on a PR for these issues but I don't have time these days. It's pretty straightforward though, so if you want to practice some python, I'd say just go for it :)
Posted a bug report here: https://github.com/pyrogram/pyrogram/issues/637
I'll open a stopgap PR to fix the urgent issue that the script doesn't work at the moment, due to the search_limit = 1000
default setting, which isn't compatible with the Search pagination which returns chunks of up to 100 messages only.
A nicer solution that we could work on next would be to use search_messages instead of Search
, which handles pagination transparently.
Resolved by #33, thanks everyone involved in this discussion!
running the script only shows that you have deleted 100 messages. When looking at the telegram group, all the messages disappear but after a while all the messages appear again except 100 messages