Open Rafagd opened 8 months ago
This was causing some kind of issue with the output text, I remember, but not exactly what it was. There have been some changes to pandas and telegram since, so it may be fine now, or I may have intended to only filter emoji or something. When I have a chance, I will check the behaviour, but if you're in a rush, it should be safe to remove that line, though the output might be broken.
I have already commented it out in our clone of the repo, and it seems to work out fine. I haven't tested the emoji case, but it seems to be working fine.
Em ter., 12 de mar. de 2024 05:27, mkdryden @.***> escreveu:
This was causing some kind of issue with the output text, I remember, but not exactly what it was. There have been some changes to pandas and telegram since, so it may be fine now, or I may have intended to only filter emoji or something. When I have a chance, I will check the behaviour, but if you're in a rush, it should be safe to remove that line, though the output might be broken.
— Reply to this email directly, view it on GitHub https://github.com/mkdryden/telegram-stats-bot/issues/31#issuecomment-1990557386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAELZSTQ77462JTW6X5LPMLYX2N4ZAVCNFSM6AAAAABEQ7D47KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQGU2TOMZYGY . You are receiving this because you authored the thread.Message ID: @.***>
Hi, I have started using this this bot in a personal group of friends chat and we've noticed it completely mangles the name of one of them. His name happens to contain a ç and that character was completely removed from his logged entry.
I have investigated the code and I've stumbled upon the following line:
Which basically states it's dropping emoji and the @ symbol. Not sure why that's even necessary but it's doing way more than just dropping emojis, it's dropping everything that's outside ASCII range. So no latin-alphabet extensions like é ü ø æ, and no support at all for non-latin scripts like cyrillic, greek, arabic, chinese, etc...
Is there a particular reason for this line to exist? Python and Postgres should support UTF8 just fine...