jtauber / pyuca

a Python implementation of the Unicode Collation Algorithm
MIT License
217 stars 23 forks source link

Possible usage of itemgetter(key) #26

Open filak opened 1 year ago

filak commented 1 year ago

Thank you for a great library!

I would like to sort a list of dicts by multiple values - some possibly in Unicode - sample program:

from pyuca import Collator
from operator import itemgetter
coll = Collator()

def multisort_list_of_dicts(xs, specs):
    for key, reverse, col in reversed(specs):   
        if col:
            xs.sort(key=coll.sort_key(itemgetter(key)), reverse=reverse)
        else:    
            xs.sort(key=itemgetter(key), reverse=reverse)
    return xs

data = [{'k1': 'd', 'k2': 10},{'k1': 'č', 'k2': 10},{'k1': 'a', 'k2': 20},{'k1': 'a', 'k2': 10}]

Standard sorting:

sort_spec = (('k2', False, False), ('k1', False, False))

for item in multisort_list_of_dicts(data, sort_spec):
    print(item)
{'k1': 'a', 'k2': 10}
{'k1': 'd', 'k2': 10}
{'k1': 'č', 'k2': 10}
{'k1': 'a', 'k2': 20}

Unicode sorting for k1:

sort_spec = (('k2', False, False), ('k1', False, True))

for item in multisort_list_of_dicts(data, sort_spec):
    print(item)
  File "C:\Programs\Python\Python311\Lib\site-packages\pyuca\collator.py", line 119, in sort_key
    normalized_string = unicodedata.normalize("NFD", string)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: normalize() argument 2 must be str, not operator.itemgetter

How to get the desired output with pyuca ?

{'k1': 'a', 'k2': 10}
{'k1': 'č', 'k2': 10}
{'k1': 'd', 'k2': 10}
{'k1': 'a', 'k2': 20}

How to possibly handle the itemgetter input in pyuca ?

filak commented 1 year ago

Sure there is a workaround - pre-populate the data with sort_key() values. But this might not be always optimal/feasible.

from operator import itemgetter
from pyuca import Collator
coll = Collator()

def multisort_list_of_dicts(xs, specs):
    for key, reverse in reversed(specs):
        xs.sort(key=itemgetter(key), reverse=reverse)
    return xs

data = [{'k1': 'd', 'k2': 10},{'k1': 'č', 'k2': 10},{'k1': 'a', 'k2': 20},{'k1': 'a', 'k2': 10}]

data_sortable = []

for d in data:
     d['ks'] = coll.sort_key( d['k1'] )
     data_sortable.append(d)

sort_spec = (('k2', False), ('ks', False))

for item in multisort_list_of_dicts(data_sortable, sort_spec):
    print(item)

Output:

{'k1': 'a', 'k2': 10, 'ks': (7239, 0, 32, 0, 2, 0)}
{'k1': 'č', 'k2': 10, 'ks': (7290, 0, 32, 40, 0, 2, 2, 0)}
{'k1': 'd', 'k2': 10, 'ks': (7311, 0, 32, 0, 2, 0)}
{'k1': 'a', 'k2': 20, 'ks': (7239, 0, 32, 0, 2, 0)}