Open kienmarkdo opened 9 months ago
Potentially useful. Cannot collect 100% of users https://github.com/LonamiWebs/Telethon/issues/580 https://github.com/LonamiWebs/Telethon/issues/3820
Get participants limit of 10,000 users from API, but using search gives almost 90% of users. Sources from 2018: https://github.com/LonamiWebs/Telethon/issues/580 https://github.com/LonamiWebs/Telethon/issues/573
Potential future issue. Telegram cap participants collection at 10,000 users https://github.com/LonamiWebs/Telethon/issues/604
Testing... Actual result:
$ python scrape.py
[+] Collection in progress...
public_group - 1012147388 whalepoolbtc Whalepool
[+] Participants collection in progress...
6561 participants exported to output_public_group_1012147388/participants_1012147388.json
Expected result:
Testing 2... Actual result:
==========================================================================
[+] Collection in progress...
public_group - 1012147388 whalepoolbtc Whalepool
[+] Participants collection in progress...
Rotating to new socks5 proxy at 167.71.189.116:1080
6559 participants exported to output_public_group_1012147388/participants_1012147388.json
------------------------------------------------------
[+] Collection in progress...
broadcast_channel - 1974811824 bloombergcrypto Bloomberg Crypto
[+] Participants collection in progress...
Cannot collect participants in broadcast_channel. Skipping participants collection...
------------------------------------------------------
[+] Collection in progress...
public_group - 1180298540 PocoPhonePhotography Pocophone F1 | PHOTOGRAPHY
[+] Participants collection in progress...
Rotating to new socks5 proxy at 167.71.189.116:1080
6977 participants exported to output_public_group_1180298540/participants_1180298540.json
------------------------------------------------------
Expected:
6760
None
7207
Testing Actual: 21345 users collected... Expected: 28299 users
Whalepool group... UI says 6758 get_participants() returns 6562 (and also caps out at 10,000 with no way of using offsets) GetParticipantsRequest() returns 6615. Offset is allowed.
This does not work either. It collected 6500 users in whalepool, and 5 in the big Telegram Scraper group https://github.com/pyrogram/pyrogram/blob/142a27f52ac910d8a5afed247c6da493c64c3993/examples/get_participants2.py
actually... no matter what method I use now, the Telegram Scraper group only returns 5 users...
nevermind. it's just a coincidence that today, that group enabled a feature that hides the members in the group. I can only view the owners of the group now, not other members, even though this is a public group, not a broadcast channel
Telethon participants API restriction and ban: https://docs.telethon.dev/en/stable/quick-references/faq.html?highlight=participants#my-account-was-deleted-limited-when-using-the-library
Emojis are stored as UTF-16 as we have discovered.
"first_name": "\ud83d\ude02 harry_", # returned by Telethon
first_name='๐ harry_', # get me
# Looks at the user's firstname and checks if the first occuring English character after a word boundary matches the key.
# Example: "harry" returns ["h"], "h a r r y" returns ["h", "a", "r", "r", "y"]
# "๐ harry_ ุฃููุงู ะฒะตั " returns ["h"]
# "๐ ุฃููุงู ะฒะตั " returns [""]
if re.findall(r"\b[a-zA-Z]", user.first_name)[0].lower() == key:
Add feature that collects participants
Are there issues such as