pimutils / todoman

✅ A simple, standards-based, cli todo (aka: task) manager.
https://todoman.readthedocs.io
ISC License
486 stars 80 forks source link

categories with diacritics not handled well #540

Closed balejk closed 1 year ago

balejk commented 1 year ago

Steps to reproduce:

  1. Create a new task via interactive mode and set its category to ý.
  2. Run todo list -c ý
  3. The task is not listed.

I have tried setting 🤷 as a category as well to check whether it could be some more general encoding error, but this works.

I am using version 4.3.1 from the Void Linux repository.

hrueschwein commented 1 year ago

Faced this issue with Cyrillic categories, for example:

> todo new -c Дом Покормить кошку
[ ] 10  05.09.2023 19:43 Покормить кошку @- [Дом]
> todo
/* ... */
[ ] 10  05.09.2023 19:43 Покормить кошку  [Дом]
/* ... */
> todo list -c Дом

Last command just prints new line.

WhyNotHugo commented 1 year ago

The way todoman handles categories, it inserts them into the cache database, and then queries that. In order to make this search case insensitive, the category is converted to uppercase when searching.

I checked the cache and categories BA and ý are inserted like so:

> sqlite3 ~/.cache/todoman/cache.sqlite3
SQLite version 3.43.0 2023-08-24 12:36:59
Enter ".help" for usage hints.
sqlite> select * from categories ;
1|599|ý
2|600|ba

I also checked the uppercase version of each one:

> sqlite3 ~/.cache/todoman/cache.sqlite3
sqlite> select category, upper(category) from categories;
ý|ý
ba|BA

Apparently sqlite thinks that ý in uppercase is ý. Edit: Apparently we can tell sqlite that content is Cyrillic, but that would break anything non-Cyrillic

However, Python thinks that ý in uppercase is Ý (I'm pretty confident that this is correct Edit: This is definitely correct).

>>> "ý".upper()
'Ý'

One fix is to potentially tell sqlite how to properly resolve uppercase. I'm not sure how to do that.

Another fix is to convert categories to uppercase in python when inserting them. The problem here is that we lose the original case when we read the cache (although the original information is still in the todo file).

balejk commented 1 year ago

Apparently ý thinks that ý in uppercase is ý.

However, Python thinks that ý in uppercase is Ý (I'm pretty confident that this is correct).

Yes, the latter is indeed correct.

One fix is to potentially tell sqlite how to properly resolve uppercase. I'm not sure how to do that.

If I understand correctly, this is not possible with bare sqlite as upper can only handle ASCII [1]. The suggestion at the linked page is to use the ICU extension, but I don't know what that would require (there, however, seems to be a Python package for this [2]).

Another fix is to convert categories to uppercase in python when inserting them. The problem here is that we lose the original case when we read the cache (although the original information is still in the todo file).

How about just creating a new column at the categories table with the uppercase name converted by Python and using that for case-insensitive filtering, keeping the original name as it is now?

[1] https://www.sqlitetutorial.net/sqlite-functions/sqlite-upper/ [2] https://pypi.org/project/sqlite-icu/

hrueschwein commented 1 year ago

Hello! Will v4.3.2 be released in pip soon?

WhyNotHugo commented 1 year ago

v4.3.2 is out