Closed bjhargrave closed 15 years ago
Comment author: @bjhargrave
While trying to implement the filter changes from bug BZ#372, I found a number of problems with the new operators introduced in bug BZ#372.
The grammar from the recent 4.2 draft spec section 3.2.7:
filter ::= ’(’ filter-comp ’)’
filter-comp ::= and | or | not | operation
and ::= ’&’ filter-list
or ::= ’|’ filter-list
not ::= ’!’ filter
filter-list ::= filter | filter filter-list
operation ::= simple | present | substring
simple ::= attr filter-type value
filter-type ::= equal | approx | greater | less
| greater-eq | less-eq | not-eq
equal ::= ’=’
approx ::= ’~=’
greater-eq ::= ’>=’
less-eq ::= ’<=’
greater ::= ’>’
less ::= ’<’
not-eq ::= ’!=’
present ::= attr ’=’
substring ::= attr ’=’ initial any final
inital ::= () | value
any ::= ’’ star-value
star-value ::= () | value ’*’ star-value
final ::= () | value
value ::=
does not handle the case of the "not-substring":
foo != xyz
The following production needs to be added:
not-substring ::= attr '!=' initial any final
and not-substring needs to be added to the operation production.
Also the following sentence:
"attr must not contain the characters '=', '>', '<', '~', '(' or ')'."
needs to be updated to include '!' as a character that may not be in attr since '!' is now a start character for the != operator. This does have a backwards compatibility concern since attr values may no longer contain '!' since it is now a reserved character for an operator.
The sentence "The substring production only works for attributes that are of type String, Collection of String or String[]." needs to also mention the not-substring production.
The sentence "If value must contain one of the characters ’\’, '*', ’(’ or ')', then these characters should be preceded with the backslash (’\’) character." needs updating. Since
(foo>=bar)
is now ambiguous. It could mean foo > "=bar" or foo >= "bar". A greedy parser can prefer foo >= "bar" but if the user wants to say foo > "=bar" then they need a way to escape the = so that it is not part of the operator. But = only needs escaping if it is the first char of value. Requiring it to always be escaped in the value is a backward compatibility problem. I suppose one answer would be to just require a space: (foo> =bar) But that is also a problem, since spaces after the operator and before the ')' are part of the value. So the above would be foo > " =bar".
All of these issues now make me see that the LDAP guys must have already been down this road which is why the LDAP filter syntax does not support >, < and !=. They are just too awkward to fit into the simple grammar and avoid ambiguity issues.
I would like to propose we reverse the decision in bug BZ#372 and just leave things as they were.
Comment author: @pkriens
Good catch, that is why we need people to implement this ...
However, I do not agree. These are all rather simple problems and they seriously simplify filters.
'~!' not
'~>' greater or equals
'~<' less or equals
'~=' approximate
The tilde is used in '~=' (approximate) and is therefore already in the reserved characters. Because a '=' must follow it today, we can decide what should follow it in the next spec. I thought of making not the '~', but that would rule out the possibility to add more operators in the future. As far as I can see, the '~' does not cause any backward compatibility issues.
I do not think this is an issue that was foreseen by LDAP. The problem is that value is defined to have too much (unnecessary) freedom.
Comment author: @bjhargrave
The proposal is to add 3 new operators: less than (normally written "<"), greater than (normally written ">") and not equals (commonly written as "!=").
The following are completely unintuitive:
'~!' not equal '~>' greater than '~<' less than
In reading an filter string, people will not understand these operators. They are OSGi inventions and not like any operator
While the idea of using '~' to start these new operators solves the ambiguity problem, it leaves us with completely unintuitive operators.
(!(foo=bar)) is much more readily understandable than (foo~!bar) to someone not deeply familiar with these new OSGi filter operators. I just don't see that this design "seriously simplif[ies] filters".
Comment author: @pkriens
Nothing is intuitive until you you have used it a number of times ... The question is if there is enough logic in there to remember it after learning it (with the reason why). I think it is. Though I have not such a big problem with going with the backward compatibility issue, someone using a>=3, and then meaning a > '=3' must be quite bizarre. The change that they really meant a >= '3' seems much, much, much higher. So tightening up the spec in this regards would be my preference.
However, if we need to maintain this backward compatibility constraint, the ~ would be a solution. I disagree with you, (!(a>3)) is not more "intuitive" then (a~<3) after you learned what the ~< means once. The set of operators does not end with what more less randomly happened to be in C forty years ago ...
Comment author: @bjhargrave
CPEG call: We discussed this and the issues with ambiguity, backwards compatibility and non-intuitiveness. BJ campaigned to forget adding these operators as the do not add any power and they have complications. Richard supported not adding them if they bring the discussed complications. No one objected to having the new operators if they were free. :-) Peter still supports adding them and requested an opportunity to rework the proposal to improve it.
Comment author: @pkriens
not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)
I would again like to make a case for the super/subset filter operation because they seem to be required for OBR to implement mandatory attributes and also for the NTT use case.
superset >
a = [ 1, 2 ]
a > 1,2,3 false a > 1,2 true a >* 1 true
a < 1,2,3 true a < 1,2 true a <* 1 false
Yes, >> means shift right for old geezers but then again, ~= or =* are neither very "intuitive" operators. And in the context of a filter shift has no meaning. I think that the proposed operators have a very high mnemonic value.
Take a look at ASCII to APL mapping to see what people can do with operators ...
Comment author: Ikuo Yamasaki <yamasaki.ikuo@lab.ntt.co.jp>
not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)
I would again like to make a case for the super/subset filter operation because they seem to be required for OBR to implement mandatory attributes and also for the NTT use case.
superset >
(',' subset <) (',' )
Yes, it would help our (NTT) use cases in RFC131: checkPermissions for registering service under multiple service interfaces, and checkPermssions using some service properties.
Comment author: glyn.normington@springsource.com
not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)
Sorry, but I find these three particularly unintuitive from a mathematical standpoint. The motivation for <> seems a bit nonsensical and in mathematics, >> and << mean "much greater than" and "much less than", respectively. (Also, if the syntax can cope with >> and <<, why can't it cope with > and <?)
I'm more comfortable with tokens that are not part of conventional mathematics if we can't manage the standard mathematical tokens.
On:
superset >
(',' subset <) (',' ) a = [ 1, 2 ]
a > 1,2,3 false again the mathematician in me feels a bit queasy. Remember that there are strict superset (i.e. superset but not equal) and superset (i.e. strict superset or equal) operators in mathematics. Plus sets are normally written with curly braces. Rather than > and < (or < ?), why not overload the other comparison operators and use curly braces to disambiguate literals, thus:
a = {1, 2}
a >= {1, 2, 3} false
Just a thought...
Comment author: @bjhargrave
if the syntax can cope with >> and <<, why can't it cope with > and <?)
This is discussed in comment 0. When parsing and you encounter a > and then an =, is the operator >= or is the = part of the operand?
can be parsed unambiguously. As can <> and <> since <= is already an operator. As unusual as these operators are, they can be parsed unambiguously. But I still don't see the value add. They add no more expressiveness since they are just alternate forms of thing you can already express. It hardly seems worth adding them since we now end up with filterstrings which can't be parsed by older frameworks and no way to version filterstrings.
a = {1, 2}
a >= {1, 2, 3} false
Just a thought...
The problem with a >= {1, 2, 3} is that it is ambiguous whether I mean a string compare or a superset operation. At least with the >* operator, the operator performs the disambiguation. Perhaps we need combination of the operator and curly braces to enforce the notion that the operator are set operators.
superset > {
a = [ 1, 2 ]
a > {1,2,3} false a > {1,2} true a >* {1} true
a < {1,2,3} true a < {1,2} true a <* {1} false
But then don't we also need a set equals operation? Or perhaps I just do this:
(&(a > {1,2})(a < {1,2}))
:-)
Comment author: @bjhargrave
CPEG call: After a length debate, we agreed to remove the new operators from 4.2 reverting back to the 4.1 level of filters thus backing out the changes from bug BZ#372.
Instead of tweaking the current filter language, Peter will start a new RFC for R5 which will completely update the filter language with many new operators. The framework will then be able to handle the original filter format or the new format. No new APIs will be required but the string format of the filter must be easily distinguishable by the parser.
Assigning to Peter to back the changes out of the 4.2 spec.
Comment author: @pkriens
Reverted
Original bug ID: BZ#762 From: @bjhargrave Reported version: R4 V4.2
Depends on: BZ#372