jamadden / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 2 forks source link

property database - aliases (question) #23

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Well it doesn't realy count as a normal usage of the regex module, but I tried 
(again) to toy with its data structures, e.g. to get some unicode properties 
not available in unicodedata now.
(It is not that much practical, as I can just grab the unicode datafiles for 
searching, but I also wanted to dig in the (python) source of regex a bit...)

Sofar, I could make a check for unicode property (hopefully :-) work,
is it just e.g.:

>>> _regex.has_property_value((_regex.get_properties()["SCRIPT"][0] << 16) | 
_regex.get_properties()["SCRIPT"][1]["GREEK"], ord(u"Σ"))
1
>>> 
?

Is it the case, that there is no other access path, i.e. for getting some 
property to a given character, one has to check each property for every 
possible value and collect the successful matches? (Actually, it works 
surprisingly fast, given how clumsy approach this is.) 

(It is really a kind of exercise, I wouldn't want to ask for a more comfortable 
access to this data, you already offered, as this belongs to unicodedata.)

On a related note, is it somhow possible to programatically access the 
original, not normalised property names and values? - as listed on:
http://code.google.com/p/mrab-regex-hg/wiki/UnicodeProperties

It is possible to collect the aliases belonging to each other and take the 
longest ones as full forms, but the casing and spaces probably can't be 
recovered, can they?

Sorry for this possibly irrelevant "issue" (as having an issue-type "question", 
would likely by silly...

And, of course, many thanks for the recent enhancements and fixes!

regards,
 vbr

Original issue reported on code.google.com by Vlastimil.Brom@gmail.com on 29 Sep 2011 at 3:56

GoogleCodeExporter commented 9 years ago
As you've already discovered, the property names and values are normalised and 
looked up and then a function is called to check whether there's a match. The 
regex module contains only the normalised names.

Original comment by re...@mrabarnett.plus.com on 29 Sep 2011 at 4:23

GoogleCodeExporter commented 9 years ago

Original comment by re...@mrabarnett.plus.com on 29 Sep 2011 at 4:23

GoogleCodeExporter commented 9 years ago
Thanks for the prompt answer.
 vbr

Original comment by Vlastimil.Brom@gmail.com on 29 Sep 2011 at 5:25