gurland / telegram-delete-all-messages

Delete all your messages in groups / supergroups using this python script
GNU General Public License v3.0
372 stars 88 forks source link

only delete 100 msg #31

Closed Sirfrancis18 closed 3 years ago

Sirfrancis18 commented 3 years ago

running the script only shows that you have deleted 100 messages. When looking at the telegram group, all the messages disappear but after a while all the messages appear again except 100 messages

jarrah31 commented 3 years ago

Same here. I have a chat dialog containing 39,000 messages. I've changed the following line so that search_limit is 10000 and delete_chunk_size is 1000, yet it still only finds and deletes 100 messages.

https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L19

jarrah31 commented 3 years ago

It's worth noting that I'm trying to delete messages from a bot owned group as seen here: https://github.com/gurland/telegram-delete-all-messages/issues/30#issuecomment-796623934

jarrah31 commented 3 years ago

@hannesveit I see you were able to solve some search limitations with groups and supergroups. Do you think there may be a similar issue with type: bots?

I'm no Python programmer so don't know how to fix this. I've been trying to debug the code and can see that I'm passing a value of 1000 within the limit parameter: https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L110 However, when I inspect the q variable at the point shown below, messages only return 100 items:

image image

https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L82

I've no idea how I can fix this. :(

hannesveit commented 3 years ago

Hmm weird. So it looks like it's ignoring the 1000 search limit and just using 100 instead? I can look into it once I find some time...

In the meantime, can you try setting the search_limit to 100 here? It sounds like it terminates in the first iteration because the search returns only 100 messages (which is < search_limit). Let me know if that fixes it.

jarrah31 commented 3 years ago

Now that was interesting. Changing the search limit to 100 resulted in the following that I had to manually exit from. I'll just post the results for now and will debug to see why it keeps increasing the offset.

Searching messages. OFFSET: 10
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 110
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 210
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 310
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 410
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 510
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 610
Found 100 of your messages in "my_bot"
Searching messages. OFFSET: 710
hannesveit commented 3 years ago

@jarrah31 no need to manually exit from that! That's actually what it's supposed to look like (almost, see below). It's searching through your messages (in chunks of search_limit == 100, each time increasing the offset to fetch the next batch) until it has them all. Once it has all messages, it's going to delete them in chunks of 100.

The one thing that looks weird is that it's starting with an offset of 10 instead of 0. I'm assuming this is something you changed also in the code? Please revert that change and try again.

hannesveit commented 3 years ago

Maybe those search status updates could be improved to just say something like "Fetched 700 / 39000" messages or so. Also, if changing the search_limit back to 100 fixes your issue that's weird and I don't understand it, but maybe until we figure out why it's ignoring search limits > 100, we could just set the default search_limit back to 100 to repair the script for now (I have definitely tested smaller ones, like 10, and they behaved as expected; I thought I had also tried to delete > 100 messages with a search limit of 1000 and confirmed that it worked in a single search iteration, but maybe I didn't...).

jarrah31 commented 3 years ago

(Apologies, I didn't see your other comments whilst writing this)

From what I can see, it will repeat this while loop 100 messages at a time (having set search_limit to 100), until there are no more messages to find:

            while True:
                q = self.search_messages(peer, add_offset)
                message_ids.extend(msg.id for msg in q['messages'])
                messages_count = len(q['messages'])
                print(f'Found {messages_count} of your messages in "{chat.username}"')
                if messages_count < self.search_limit:
                    break
                add_offset += self.search_limit

I can see that message_ids increases by 100 new messages on each loop, and I've verified that message timestamps within each new search (q variable) shows that it keeps finding older messages due to add_offset += self.search_limit. I guess this would keep building up and eventually pass a list of messages to the delete_messages function.

First of all I think the messages_count needs to count the number of items within message_ids. @hannesveit if you could help here please it would be really appreciated - I just don't know Python without lots of googling, but I can usually walk through code to debug it.

Once messages_count is accurate, a search_limit would be a more practical. It would then break out of the loop after say 1000 messages have been found and then goes ahead to delete them.

Given that we seem to have a hard limit of 100 messages when calling app.send(Search(, we should probably hard code those limits into the script. In the above loop, add_offset += self.search_limit would become add_offset += 100, and the following line: https://github.com/gurland/telegram-delete-all-messages/blob/9142174ec762ada0bc20dd2f4885ab7dfd26a88a/cleaner.py#L110 Would be limit=100, (if direct integers actually work)

It still doesn't seem right that we are limited to 100 searches when there's an option within the search call to specify the limit. 100 seems too small! I need to do more testing and post an issue on Pyrogram GitHub.

jarrah31 commented 3 years ago

The one thing that looks weird is that it's starting with an offset of 10 instead of 0. I'm assuming this is something you changed also in the code? Please revert that change and try again.

Yes that was my doing - I want to keep the last 10 messages. :)

hannesveit commented 3 years ago

Yes that was my doing - I want to keep the last 10 messages. :)

OK. Note that IIRC this will keep only your 10 oldest messages.

hannesveit commented 3 years ago

First of all I think the messages_count needs to count the number of items within message_ids.

Once messages_count is accurate

Not sure I understand what you mean by these comments. How is messages_count not accurate? It is the number of messages returned by the current search call. If we ignore the real bug for a second (that apparently the search seems to ignore limit > 100), then in each iteration, either messages_count == self.search_limit (so it managed to fetch another full batch of search_limit messages and will do another iteration to see if there's more) or messages_count < self.search_limit, in which case we know that we're done searching, because there were fewer than search_limit messages left.

I see only one real issue here: the pyrogram search seems to ignore limit values > 100. As you suggested, if this is really a hard limit (I couldn't find it in their documentation though), we might want to hardcode this value in the search call (and add a comment explaining why, so the next contributor doesn't undo it!)

jarrah31 commented 3 years ago

OK. Note that IIRC this will keep only your 10 oldest messages.

It looks like it keeps the 10 latest or newest messages. I've just double checked by pausing after the first search. The first message datestamp is indeed the 10th latest message, and the last one is from yesterday. Thanks for making me check though as you never know with these things. :)

jarrah31 commented 3 years ago

First of all I think the messages_count needs to count the number of items within message_ids. Once messages_count is accurate

Not sure I understand what you mean by these comments. How is messages_count not accurate? It is the number of messages returned by the current search call. If we ignore the real bug for a second (that apparently the search seems to ignore limit > 100), then in each iteration, either messages_count == self.search_limit (so it managed to fetch another full batch of search_limit messages and will do another iteration to see if there's more) or messages_count < self.search_limit, in which case we know that we're done searching, because there were fewer than search_limit messages left.

I see only one real issue here: the pyrogram search seems to ignore limit values > 100. As you suggested, if this is really a hard limit (I couldn't find it in their documentation though), we might want to hardcode this value in the search call (and add a comment explaining why, so the next contributor doesn't undo it!)

I see what you mean, as you said it's in there for when messages_count < self.search_limit which is a valid use case. I guess I was thinking for unusual situations like mine where I have 39,000 messages (it's a bot used to report camera, motion, and door sensor activity, so it can be quite chatty). I'd like to specify a limit on how many it deletes on each script run, say 1000, just to make sure it's working and doesn't break the script by having too many messages to process.

Regarding messages_count, I was thinking along the same lines as your earlier suggestion to indicate current progress: "Fetched 700 / 39000".

hannesveit commented 3 years ago

Yeah, search_limit is actually an unfortunate name for this attribute. A better name would be search_chunk_size or something like that. It is passed as the limit to the search, so in that sense it's a "limit" in one single search iteration, but in terms of your entire search, it specifies the chunk size rather than a "limit"

What you suggest re: specifying a global limit on the total number of messages to delete sounds reasonable as well. Also making the initial offset configurable in the constructor, to support your other use case where you want to keep a certain number of recent messages sounds like a good idea.

I'm happy to work on a PR for these issues but I don't have time these days. It's pretty straightforward though, so if you want to practice some python, I'd say just go for it :)

jarrah31 commented 3 years ago

Posted a bug report here: https://github.com/pyrogram/pyrogram/issues/637

hannesveit commented 3 years ago

I'll open a stopgap PR to fix the urgent issue that the script doesn't work at the moment, due to the search_limit = 1000 default setting, which isn't compatible with the Search pagination which returns chunks of up to 100 messages only.

A nicer solution that we could work on next would be to use search_messages instead of Search, which handles pagination transparently.

gurland commented 3 years ago

Resolved by #33, thanks everyone involved in this discussion!