harfbuzz / youseedy

Python module to access all of Unicode Character Database
7 stars 2 forks source link

Are you planning on doing anything with this? #1

Open simoncozens opened 4 years ago

simoncozens commented 4 years ago

I recently started a similar thing - almost the same name, too! I looked around and found this. In my version I use the flat files instead of the XML. (It turns out to be much faster to parse.):

$ PYTHONPATH=Lib python3 -m youseedee ద
Downloading Unicode Character Database...
[==================================================]
{'Age': '1.1',
 'Block': 'Telugu',
 'Canonical_Combining_Class': '0',
 'East_Asian_Width': 'N',
 'General_Category': 'Lo',
 'Indic_Syllabic_Category': 'Consonant',
 'Line_Break': 'AL',
 'Name': 'TELUGU LETTER DA',
 ...

$ time PYTHONPATH=Lib python3 -m youseedee ద > /dev/null
PYTHONPATH=Lib python3 -m youseedee ద > /dev/null  0.67s user 0.10s system 96% cpu 0.801 total

$ time PYTHONPATH=Lib python3 -m youseedy ucd.nounihan.grouped.xml ద > /dev/null
PYTHONPATH=Lib python3 -m youseedy ucd.nounihan.grouped.xml ద > /dev/null  6.61s user 0.56s system 98% cpu 7.287 total

I don't know whether it's better to keep working on mine or contribute to this.

ebraminio commented 4 years ago

Guess is related https://github.com/harfbuzz/harfbuzz/commit/771712b3ca97035ba5690e65bd7e63a852286159

behdad commented 4 years ago

Hey...

I read the XML because it contains all data. My goal with this package was to read those and use packTab and other tricks to generate fast optimized libraries for C and w Python bindings to access data fast. In that model doesn't matter if loading the XML is slow.

But I don't see myself working on it. So, do what you need to. I'd be happy if you take over this module and discuss your needs and do your work here. Or doesn't matter, we can kill this / archive it.

PackTab is nice though. The code is horrible, but works for now.

simoncozens commented 4 years ago

Here's my thing: https://github.com/simoncozens/youseedee