Problems with the new filter operators: >, <, !=

bjhargrave commented 16 years ago

Original bug ID: BZ#762 From: @bjhargrave Reported version: R4 V4.2

Depends on: BZ#372

bjhargrave commented 16 years ago

Comment author: @bjhargrave

While trying to implement the filter changes from bug BZ#372, I found a number of problems with the new operators introduced in bug BZ#372.

The grammar from the recent 4.2 draft spec section 3.2.7:

does not handle the case of the "not-substring":

foo != xyz

The following production needs to be added:

not-substring ::= attr '!=' initial any final

and not-substring needs to be added to the operation production.

Also the following sentence:

"attr must not contain the characters '=', '>', '<', '~', '(' or ')'."

needs to be updated to include '!' as a character that may not be in attr since '!' is now a start character for the != operator. This does have a backwards compatibility concern since attr values may no longer contain '!' since it is now a reserved character for an operator.

The sentence "The substring production only works for attributes that are of type String, Collection of String or String[]." needs to also mention the not-substring production.

The sentence "If value must contain one of the characters ’\’, '*', ’(’ or ')', then these characters should be preceded with the backslash (’\’) character." needs updating. Since

(foo>=bar)

is now ambiguous. It could mean foo > "=bar" or foo >= "bar". A greedy parser can prefer foo >= "bar" but if the user wants to say foo > "=bar" then they need a way to escape the = so that it is not part of the operator. But = only needs escaping if it is the first char of value. Requiring it to always be escaped in the value is a backward compatibility problem. I suppose one answer would be to just require a space: (foo> =bar) But that is also a problem, since spaces after the operator and before the ')' are part of the value. So the above would be foo > " =bar".

All of these issues now make me see that the LDAP guys must have already been down this road which is why the LDAP filter syntax does not support >, < and !=. They are just too awkward to fit into the simple grammar and avoid ambiguity issues.

I would like to propose we reverse the decision in bug BZ#372 and just leave things as they were.

bjhargrave commented 16 years ago

Comment author: @pkriens

Good catch, that is why we need people to implement this ...

However, I do not agree. These are all rather simple problems and they seriously simplify filters.

 '~!'    not
 '~>'    greater or equals 
 '~<'    less or equals
 '~='    approximate

The tilde is used in '~=' (approximate) and is therefore already in the reserved characters. Because a '=' must follow it today, we can decide what should follow it in the next spec. I thought of making not the '~', but that would rule out the possibility to add more operators in the future. As far as I can see, the '~' does not cause any backward compatibility issues.

I do not think this is an issue that was foreseen by LDAP. The problem is that value is defined to have too much (unnecessary) freedom.

bjhargrave commented 16 years ago

Comment author: @bjhargrave

The proposal is to add 3 new operators: less than (normally written "<"), greater than (normally written ">") and not equals (commonly written as "!=").

The following are completely unintuitive:

 '~!'    not equal
 '~>'    greater than 
 '~<'    less than

In reading an filter string, people will not understand these operators. They are OSGi inventions and not like any operator

While the idea of using '~' to start these new operators solves the ambiguity problem, it leaves us with completely unintuitive operators.

(!(foo=bar)) is much more readily understandable than (foo~!bar) to someone not deeply familiar with these new OSGi filter operators. I just don't see that this design "seriously simplif[ies] filters".

bjhargrave commented 16 years ago

Comment author: @pkriens

Nothing is intuitive until you you have used it a number of times ... The question is if there is enough logic in there to remember it after learning it (with the reason why). I think it is. Though I have not such a big problem with going with the backward compatibility issue, someone using a>=3, and then meaning a > '=3' must be quite bizarre. The change that they really meant a >= '3' seems much, much, much higher. So tightening up the spec in this regards would be my preference.

However, if we need to maintain this backward compatibility constraint, the ~ would be a solution. I disagree with you, (!(a>3)) is not more "intuitive" then (a~<3) after you learned what the ~< means once. The set of operators does not end with what more less randomly happened to be in C forty years ago ...

bjhargrave commented 16 years ago

Comment author: @bjhargrave

CPEG call: We discussed this and the issues with ambiguity, backwards compatibility and non-intuitiveness. BJ campaigned to forget adding these operators as the do not add any power and they have complications. Richard supported not adding them if they bring the discussed complications. No one objected to having the new operators if they were free. :-) Peter still supports adding them and requested an opportunity to rework the proposal to improve it.

bjhargrave commented 15 years ago

Comment author: @pkriens

not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)

I would again like to make a case for the super/subset filter operation because they seem to be required for OBR to implement mandatory attributes and also for the NTT use case.

superset > (',' ) subset < (',' )

a = [ 1, 2 ]

a > 1,2,3 false a > 1,2 true a >* 1 true

a < 1,2,3 true a < 1,2 true a <* 1 false

Yes, >> means shift right for old geezers but then again, ~= or =* are neither very "intuitive" operators. And in the context of a filter shift has no meaning. I think that the proposed operators have a very high mnemonic value.

Take a look at ASCII to APL mapping to see what people can do with operators ...

bjhargrave commented 15 years ago

Comment author: Ikuo Yamasaki <yamasaki.ikuo@lab.ntt.co.jp>

not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)

I would again like to make a case for the super/subset filter operation because they seem to be required for OBR to implement mandatory attributes and also for the NTT use case.

superset > (',' ) subset < (',' )

Yes, it would help our (NTT) use cases in RFC131: checkPermissions for registering service under multiple service interfaces, and checkPermssions using some service properties.

bjhargrave commented 15 years ago

Comment author: glyn.normington@springsource.com

not <> (both greater and less, i.e. not) greater >> (more than greater or equal) less << (less than less or equal)

Sorry, but I find these three particularly unintuitive from a mathematical standpoint. The motivation for <> seems a bit nonsensical and in mathematics, >> and << mean "much greater than" and "much less than", respectively. (Also, if the syntax can cope with >> and <<, why can't it cope with > and <?)

I'm more comfortable with tokens that are not part of conventional mathematics if we can't manage the standard mathematical tokens.

On:

superset > (',' ) subset < (',' )

a = [ 1, 2 ]

a > 1,2,3 false again the mathematician in me feels a bit queasy. Remember that there are strict superset (i.e. superset but not equal) and superset (i.e. strict superset or equal) operators in mathematics. Plus sets are normally written with curly braces. Rather than > and < (or < ?), why not overload the other comparison operators and use curly braces to disambiguate literals, thus:

a = {1, 2}

a >= {1, 2, 3} false

Just a thought...

bjhargrave commented 15 years ago

Comment author: @bjhargrave

if the syntax can cope with >> and <<, why can't it cope with > and <?)

This is discussed in comment 0. When parsing and you encounter a > and then an =, is the operator >= or is the = part of the operand?

can be parsed unambiguously. As can <> and <> since <= is already an operator. As unusual as these operators are, they can be parsed unambiguously. But I still don't see the value add. They add no more expressiveness since they are just alternate forms of thing you can already express. It hardly seems worth adding them since we now end up with filterstrings which can't be parsed by older frameworks and no way to version filterstrings.

a = {1, 2}

a >= {1, 2, 3} false

Just a thought...

The problem with a >= {1, 2, 3} is that it is ambiguous whether I mean a string compare or a superset operation. At least with the >* operator, the operator performs the disambiguation. Perhaps we need combination of the operator and curly braces to enforce the notion that the operator are set operators.

superset > { (',' ) } subset < { (',' ) }

a = [ 1, 2 ]

a > {1,2,3} false a > {1,2} true a >* {1} true

a < {1,2,3} true a < {1,2} true a <* {1} false

But then don't we also need a set equals operation? Or perhaps I just do this:

(&(a > {1,2})(a < {1,2}))

:-)

bjhargrave commented 15 years ago

Comment author: @bjhargrave

CPEG call: After a length debate, we agreed to remove the new operators from 4.2 reverting back to the 4.1 level of filters thus backing out the changes from bug BZ#372.

Instead of tweaking the current filter language, Peter will start a new RFC for R5 which will completely update the filter language with many new operators. The framework will then be able to handle the original filter format or the new format. No new APIs will be required but the string format of the filter must be easily distinguishable by the parser.

Assigning to Peter to back the changes out of the 4.2 spec.

bjhargrave commented 15 years ago

Comment author: @pkriens

Reverted

osgi / bugzilla-archive

Problems with the new filter operators: >, <, != #661