Open macchiati opened 22 hours ago
Based on a first look and consideration, this formulation of the selection algorithm should give the same results as the current one, but with a few caveats (in no particular order):
*
keys need to be handled directly within the selector-list.match(key-list) and selector-list.compare(key-list1, key-list2) methods rather than being passed to the user-defined methods. That's the only way we can guarantee their behaviour, as well as simplifying the inputs to user code to always be only strings.I'd be very happy to review a PR replacing our current text with this, provided that the above concerns are accounted for.
I don't want to stress the system, given the pending deadlines. I think the important thing is to have a clause that stresses that the current algorithm doesn't have to be followed exactly, the only requirement is that the same results obtain as the current text.
On Thu, Oct 3, 2024, 22:00 Eemeli Aro @.***> wrote:
Based on a first look and consideration, this formulation of the selection algorithm should give the same results as the current one, but with a few caveats (in no particular order):
- The inclusion of a best result for selector.match(key) is an unnecessary complication to the spec algorithm. It would be valid for an implementation to provide that optimization, but we don't need to care about early results in the spec text.
- The keys need to be handled directly within the selector-list.match(key-list) and selector-list.compare(key-list1, key-list2)* methods rather than being passed to the user-defined methods. That's the only way we can guarantee their behaviour, as well as simplifying the inputs to user code to always be only strings.
- The bit about parsing key values as NFC needs to be retained.
- We don't need to amend the ABNF to account for these changes. The selector-list and key-list values contain resolved values rather than syntax values, so they'll need to be constructed as a part of the algorithm in any case.
I'd be very happy to review a PR replacing our current text with this, provided that the above concerns are accounted for.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/message-format-wg/issues/898#issuecomment-2392824887, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMFRN37GBJCBQPMNMSTZZYOFDAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSHAZDIOBYG4 . You are receiving this because you authored the thread.Message ID: @.***>
I think the algorithm for variant selection is much too complicated. That is, I think we can structure it in a way that gets the same results, but is not as complicated to explain — and matches a simpler and more efficient implementation that (a) doesn’t involve sorting, (b) is single-pass, and (c) has a fast exit.
This was sparked by the discussion around "resolved value" being needed in pattern selection. The 'dot' notation is used for convenience here, but needn't be in the fleshed-out text.
Restructure the BNF to be add a couple of useful terms.
selector-list = 1*(s selector)
and so we'd havematch-statement = match selector-list
key-list = key *(s key)
and so we'd havevariant = key-list quoted-pattern
The Pattern Selection process depends on two capabilities of selectors:
Using these, list versions are defined in a natural way (see below for details):
Determining which of a message's patterns is formatted
(where there are selectors)
Determining selector-list.match(key-list)
In other words, the result is fail if any selector.match(key) value \= fail, else best if every selector.match(key) value \= best, else ok.
Determining selector-list.compare(key-list1, key-list2)
Discussion
CURRENT TEXT
https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection ...
To determine which variant best matches a given set of inputs, each selector is used in turn to order and filter the list of variants.
Each variant with a key that does not match its corresponding selector is omitted from the list of variants. The remaining variants are sorted according to the selector's key-ordering preference. Earlier selectors in the matcher's list of selectors have a higher priority than later ones.
When all of the selectors have been processed, the earliest-sorted variant in the remaining list of variants is selected.
This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here.
Resolve Selectors
First, resolve the values of each selector:
res
be a new empty list of resolved values that support selection.sel
, in source order,rv
be the resolved value ofsel
.rv
:rv
as the last element of the listres
.nomatch
be a resolved value for which selection always fails.nomatch
as the last element of the listres
.The form of the resolved values is determined by each implementation, along with the manner of determining their support for selection.
Resolve Preferences
Next, using
res
, resolve the preferential order for all message keys:pref
be a new empty list of lists of strings.i
inres
:keys
be a new empty list of strings.var
of the message:key
be thevar
key at positioni
.key
is not the catch-all key'*'
:key
is a literal.ks
be the resolved value ofkey
in Unicode Normalization Form C.ks
as the last element of the listkeys
.rv
be the resolved value at indexi
ofres
.matches
be the result of calling the method MatchSelectorKeys(rv
,keys
)matches
as the last element of the listpref
.The method MatchSelectorKeys is determined by the implementation. It takes as arguments a resolved selector value
rv
and a list of string keyskeys
, and returns a list of string keys in preferential order. The returned list MUST contain only unique elements of the input listkeys
. The returned list MAY be empty. The most-preferred key is first, with each successive key appearing in order by decreasing preference.The resolved value of each key MUST be in Unicode Normalization Form C ("NFC"), even if the literal for the key is not.
If calling MatchSelectorKeys encounters any error, a Bad Selector error is emitted and an empty list is returned.
Filter Variants
Then, using the preferential key orders
pref
, filter the list of variants to the ones that match with some preference:vars
be a new empty list of variants.var
of the message:i
inpref
:key
be thevar
key at positioni
.key
is the catch-all key'*'
:pref
.key
is a literal.ks
be the resolved value ofkey
.matches
be the list of strings at indexi
ofpref
.matches
includesks
:pref
.var
as the last element of the listvars
.Sort Variants
Finally, sort the list of variants
vars
and select the pattern:sortable
be a new empty list of (integer, variant) tuples.var
ofvars
:tuple
be a new tuple (-1,var
).tuple
as the last element of the listsortable
.len
be the integer count of items inpref
.i
belen
- 1.i
>= 0:matches
be the list of strings at indexi
ofpref
.minpref
be the integer count of items inmatches
.tuple
ofsortable
:matchpref
be an integer with the valueminpref
.key
be thetuple
variant key at positioni
.key
is not the catch-all key'*'
:key
is a literal.ks
be the resolved value ofkey
.matchpref
be the integer position ofks
inmatches
.tuple
integer value asmatchpref
.sortable
to be the result of calling the methodSortVariants(sortable)
.i
to bei
- 1.var
be the variant element of the first element ofsortable
.var
.SortVariants
is a method whose single argument is a list of (integer, variant) tuples. It returns a list of (integer, variant) tuples. Any implementation ofSortVariants
is acceptable as long as it satisfies the following requirements:sortable
be an arbitrary list of (integer, variant) tuples.sorted
beSortVariants(sortable)
.sorted
is the result of sortingsortable
using the following comparator:(i1, v1)
<=(i2, v2)
if and only ifi1 <= i2
.sortable
that are equal in their first element have the same relative order insorted
).