knadh / tg-archive

A tool for exporting Telegram group chats into static websites like mailing list archives.
MIT License
834 stars 124 forks source link

Messages from foreign channel #40

Closed Farzat07 closed 2 years ago

Farzat07 commented 2 years ago

While archiving messages from a public channel, messages from another channel @DebugSchool where included in the built website. Is this behaviour expected? And how can these messages be removed?

With the introduction of sponsors, I'm afraid that these might be sponsor messages. If that is the case, I believe an option to exclude these when building the website, or at least the rss feed, would be a good idea.

Farzat07 commented 2 years ago

Ok seems the problem can be avoided by deleting data.sqlite and starting over, but still I don't get why is this happening.

knadh commented 2 years ago

Ah, no clue how this could've happened. If the Telegram API returned those messages, these may be legitimate (sponsor?) like you said.

Will add an --exclude-users=[] flag to filter out messages from certain user IDs.

faraazb commented 2 years ago

It is a sponsored message, most probably. Although, excluding messages by user id would be a nice addition for many other purposes, excluding sponsored messages this way would still be a good amount of work as these messages could be from different senders at different times and one would have to figure out who they are. Also, I am not sure how the new sponsored messages terms of service works or if it is even applicable in the context of tg-archive. Not "displaying" these messages is considered as a violation of the Telegram API's terms of service from what they have on the TOS and sponsored messages page.

knadh commented 2 years ago

this way would still be a good amount of work as these messages could be from different senders at different times and one would have to figure out who they are.

Yep, fair point.

Also, I am not sure how the new sponsored messages terms of service works or if it is even applicable in the context of tg-archive. Not "displaying" these messages is considered as a violation of the Telegram API's terms of service from what they have on the TOS and sponsored messages page.

https://core.telegram.org/api/terms

3.3. If your app allows accessing content from Telegram channels, you must include support for official sponsored messages in Telegram channels and may not interefere with this functionality.

This is very interesting. For live telegram clients, it might make sense, but what about in the context of export and archival. Tricky.

Farzat07 commented 2 years ago

Actually I figured out the source of the problem.

Though the sponsored messages are gonna probably appear sometime and will have to be dealt with, this current case is unrelated.

It seems that the pip installation on my Ubuntu server had a data.sqlite file in the example/ directory, which is copied when a new archive is created. This data.sqlite had for some reason these messages from @DebugSchool.

Ok seems the problem can be avoided by deleting data.sqlite and starting over, but still I don't get why is this happening.

This explains the case. The data.sqlite was the initial culprit, so deleting it solves the issue.

This problem only happened on my Ubuntu server for some reason. On my local Arch machine, there is no data.sqlite file in the example/ directory, possibly because I am installing from git.

I will try uninstalling and installing the package again to see if the file appears again.

Farzat07 commented 2 years ago

Ok yup, same problem persists.

knadh commented 2 years ago

Whoa, I've been accidentally packaging test data/session files in the PyPi releases from my machine! They must've ended up in the example directory while testing and I missed them because they're untracked in git.

Just pushed a release v0.5.3 that doesn't have it. Will try and move PyPi publishing to a GitHub workflow soon.