kienmarkdo / Telegram-OSINT-for-Cyber-Threat-Intelligence-Analysis

An OSINT tool tailored for comprehensive collection, analysis, and interpretation of cyber threat intelligence from Telegram channels and groups.
2 stars 0 forks source link

Research get_participants() in public/private groups #6

Open kienmarkdo opened 9 months ago

kienmarkdo commented 9 months ago

Are there issues such as

kienmarkdo commented 9 months ago

Potentially useful. Cannot collect 100% of users https://github.com/LonamiWebs/Telethon/issues/580 https://github.com/LonamiWebs/Telethon/issues/3820

Get participants limit of 10,000 users from API, but using search gives almost 90% of users. Sources from 2018: https://github.com/LonamiWebs/Telethon/issues/580 https://github.com/LonamiWebs/Telethon/issues/573

Potential future issue. Telegram cap participants collection at 10,000 users https://github.com/LonamiWebs/Telethon/issues/604

kienmarkdo commented 9 months ago

Testing... Actual result:

$ python scrape.py
[+] Collection in progress...
public_group         - 1012147388 whalepoolbtc Whalepool

[+] Participants collection in progress...
6561 participants exported to output_public_group_1012147388/participants_1012147388.json

Expected result: image

kienmarkdo commented 9 months ago

Testing 2... Actual result:

==========================================================================
[+] Collection in progress...
public_group         - 1012147388 whalepoolbtc Whalepool

[+] Participants collection in progress...
Rotating to new socks5 proxy at 167.71.189.116:1080
6559 participants exported to output_public_group_1012147388/participants_1012147388.json
------------------------------------------------------
[+] Collection in progress...
broadcast_channel    - 1974811824 bloombergcrypto Bloomberg Crypto

[+] Participants collection in progress...
Cannot collect participants in broadcast_channel. Skipping participants collection...
------------------------------------------------------
[+] Collection in progress...
public_group         - 1180298540 PocoPhonePhotography Pocophone F1 | PHOTOGRAPHY

[+] Participants collection in progress...
Rotating to new socks5 proxy at 167.71.189.116:1080
6977 participants exported to output_public_group_1180298540/participants_1180298540.json
------------------------------------------------------

Expected:

6760
None
7207
kienmarkdo commented 9 months ago

Testing Actual: 21345 users collected... Expected: 28299 users

kienmarkdo commented 9 months ago

https://github.com/LonamiWebs/Telethon/issues/1174

image

https://stackoverflow.com/questions/75590168/is-any-way-to-get-all-participants-of-channel-in-telegram-via-telethon image

kienmarkdo commented 9 months ago

Whalepool group... UI says 6758 get_participants() returns 6562 (and also caps out at 10,000 with no way of using offsets) GetParticipantsRequest() returns 6615. Offset is allowed.

kienmarkdo commented 9 months ago

This does not work either. It collected 6500 users in whalepool, and 5 in the big Telegram Scraper group https://github.com/pyrogram/pyrogram/blob/142a27f52ac910d8a5afed247c6da493c64c3993/examples/get_participants2.py

kienmarkdo commented 9 months ago

actually... no matter what method I use now, the Telegram Scraper group only returns 5 users...

nevermind. it's just a coincidence that today, that group enabled a feature that hides the members in the group. I can only view the owners of the group now, not other members, even though this is a public group, not a broadcast channel

kienmarkdo commented 8 months ago

Telethon participants API restriction and ban: https://docs.telethon.dev/en/stable/quick-references/faq.html?highlight=participants#my-account-was-deleted-limited-when-using-the-library image

kienmarkdo commented 8 months ago

Emojis are stored as UTF-16 as we have discovered.

kienmarkdo commented 8 months ago
# Looks at the user's firstname and checks if the first occuring English character after a word boundary matches the key.
#   Example: "harry" returns ["h"], "h a r r y" returns ["h", "a", "r", "r", "y"]
#   "๐Ÿ˜‚ harry_ ุฃู‡ู„ุงู‹ ะฒะตั‚ " returns ["h"]
#   "๐Ÿ˜‚ ุฃู‡ู„ุงู‹ ะฒะตั‚ " returns [""]
if re.findall(r"\b[a-zA-Z]", user.first_name)[0].lower() == key:
kienmarkdo commented 6 months ago

Add feature that collects participants