Turkish department names aren't fully sorted

hacettepeoyt / hu-announcement-bot

Get the latest from Hacettepe with this amazing Telegram Bot!

https://t.me/HacettepeDuyurucuBot

GNU General Public License v3.0

8 stars 6 forks source link

Turkish department names aren't fully sorted #19

Closed furkansimsekli closed 1 year ago

furkansimsekli commented 2 years ago

Turkish chars like "İ, Ç, Ş" breaks the sort() function. Therefore, in the department list, they are being at the end of the list although they are not supposed to be there.

Need to implement a new sorting function. It could be done with sorted(list, key)

melikechan commented 2 years ago

Maybe another library, PyICU can be used for multilingual sorts. Here is a quick view for how Unicode Collation Algorithm works: http://unicode.org/reports/tr10/ , which is used in PyICU.

furkansimsekli commented 2 years ago

I tested #20 , and explained what went wrong there. So built-in locale isn't an option anymore.

I also tested PyICU on my local machine, it was very easy to implement. However, I struggled while trying to build PyICU on Heroku. I've tried many solutions including this and this.

If anyone built PyICU on Heroku before, I'd appreciate the helps!

div72 commented 2 years ago

What about doing some text normalization as an easy workaround? e.g.

NORMAL_MAP: dict[int, int] = str.maketrans("ĞğıİÖöŞşÜü", "GgiIOoSsUu")
def normalize(inp: str) -> str:
    return inp.translate(NORMAL_MAP)

You could pass the names as a tuple (normalized, real) and use the real name afterwards.

div72 commented 2 years ago

If anyone built PyICU on Heroku before, I'd appreciate the helps!

Would you like help with hosting the bot in a proper VPS? I can lend you some space in mine if you want.

furkansimsekli commented 2 years ago

@div72 Actually I was looking for a permanent solution, the title says "Turkish .." but it's a problem for other languages too. If we use such solution approach, we need to tweak it whenever there is a new language introduced. I also saw a nice solution on stackoverflow.

To be honest, it's silly to expect tens of languages for this app. So, using temporary solution might not be a bad idea after all :)

Would you like help with hosting the bot in a proper VPS? I can lend you some space in mine if you want.

Thanks for the VPS proposal, however since my all Telegram Bots -which there are four of them- live on Heroku, I'd want to keep them together. Working with different environments sometimes can be exhausting. If I change my mind, you are the person I'll disturb first :)

ikolomiko commented 2 years ago

Another possible solution might be including the alphabet of each language in its translation file. There we can store the correct orders of letters for each language, and then a custom sort key can handle rest of the job.

furkansimsekli commented 2 years ago

Wrong issue number in commit. It must have been #9 instead of #19