Configuration design: Reflect n:m relation between stations and applications in the configuration

porst17 commented 7 years ago

Having a list of applications for each station/profile is not enough and having a list of stations/profiles for each application is not enough either. Having both seems to be a bad design decision.

Note: this is a part of super-issue "Configuration Design": hilbert/hilbert-cli#13

porst17 commented 7 years ago

Proposal: Add groups to the mix. Groups are just sets of stations, but they can be hierarchically created. I don't think we need to preserve the hierarchy after initialization, but this is open to discussion.

# define station profiles
profiles: [ profile1: { ... }, profile2: { ... } ]

# define stations
# relation between profiles and stations are 1:n
stations:
    - station1: { profile: profile1 }
    - station2: { profile: profile1 }
    - station3: { profile: profile2 }
    - station4: { profile: profile2 }
    - station5: { profile: { ... inline profile definition ... } }

# relations between groups and stations are n:m
# profile and station names are predefined groups names
# i.e. group "profile1" would be [ station1, station2 ]
groups:
    - group1: [ station2, profile2 ]
    - group2: [ station3 ]
    - group3: [ group1, station4 ]

# now the following groups would be defined:
# station1 = [ station1 ]
# station2 = [ station2 ]
# station3 = [ station3 ]
# station4 = [ station4 ]
# station5 = [ station5 ]
# profile1 = [ station1, station2 ]
# profile2 = [ station3, station4 ]
# group1 = [ station2, profile2 ] = [ station2, station1, station2 ] = [ station1, station2 ] 
# group2 = [ station3 ]
# group3 = [ group1, station4 ] = [ station1, station2, station4 ]

During parsing, the information if a group is a predefined one based on a station name or profile name can be kept. This way it makes it easy to keep only the user defined groups and the profile based groups for filtering in the ui.

Now you could define applications and the groups of stations it may run on.

applications:
    - application1: { supportedGroups: [ station1 ] }
    - application2: { supportedGroups: [ group3 ] }
    - application3: { supportedGroups: [ group3, station5 ] }
    - application4: { supportedGroups: [ group3 ] }

This way, it is easy to support applications that run on different sets of stations and stations that supports many different applications without being to verbose.

Note that my post may not contain perfectly valid YAML ...

porst17 commented 7 years ago

Definition: groups are sets of stations.

Proposed syntax for group definition in EBNF:

disjunction = disjunction , or , disjunction | conjunction;
conjunction = term , and , term;
term = not , id | id;
and = ( " A" | " a" ) , ( "N" | "n" ) , ( "D " | "d " );
or = ( "," | ( " O" | " o" ) , ( "R " | "r " ) );
not = ( " N" | " n" ) , ( "O" | "o" ) , ( "T " | "t " );
id = ( character | digit ) , { character | digit };
character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
          | "H" | "I" | "J" | "K" | "L" | "M" | "N"
          | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
          | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
          | "c" | "d" | "e" | "f" | "g" | "h" | "i"
          | "j" | "k" | "l" | "m" | "n" | "o" | "p"
          | "q" | "r" | "s" | "t" | "u" | "v" | "w"
          | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

Reserved words: OR, AND,NOT(case insensitive) Theid`s are group names again. Predefined groups might exist for each station and profile (containing all station that use this profile).

We could also ignore whitespace, .e.g ␣␣␣␣a␣␣and␣b␣␣ is equal to a␣and␣b.

Example:

group1: station1, station2 or station3
group2: station4, group1 and not station1 or station5
group3: group1 and group2

is equivalent to

group1: station1, station2, station3
group2: station4, station2, station3, station5
group3: station2, station3

elondaits commented 7 years ago

Comments:

Why not

conjunction = term , and , term | conjunction; ?

It would make sense to ask for something such as (WINDOWS and LEAP and GOOD_GPU), plus since term can be a group we are going to have to calculate it anyway.

We have to be careful implementing this in that YAML does not keep the order of map keys. So we can't depend on the order of declarations for computing the sets. So these would be equivalent:

group1: station1, station2
group2: station3, station4
group3: group1, group2

group3: group1, group2
group1: station1, station2
group2: station3, station4

So the way to implement this, I think, would be to read all the group declarations into a map and resolve them recursively (via dynamic programming = https://en.wikipedia.org/wiki/Dynamic_programming) using a flag to avoid cycles / infinite recursion.

porst17 commented 7 years ago

Comments:

Why not

conjunction = term , and , term | conjunction; ?

It would make sense to ask for something such as (WINDOWS and LEAP and GOOD_GPU), plus since term can be a group we are going to have to calculate it anyway.

Absolutely. Fixed grammar:

disjunction = disjunction , or , disjunction | conjunction;
conjunction = term , and , term | conjunction;
term = not , id | id;
and = ( " A" | " a" ) , ( "N" | "n" ) , ( "D " | "d " );
or = ( "," | ( " O" | " o" ) , ( "R " | "r " ) );
not = ( " N" | " n" ) , ( "O" | "o" ) , ( "T " | "t " );
id = ( character | digit ) , { character | digit };
character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
          | "H" | "I" | "J" | "K" | "L" | "M" | "N"
          | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
          | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
          | "c" | "d" | "e" | "f" | "g" | "h" | "i"
          | "j" | "k" | "l" | "m" | "n" | "o" | "p"
          | "q" | "r" | "s" | "t" | "u" | "v" | "w"
          | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

Regarding dynamic programming: I think, simple recursion and a flag to indicate that a node in the recursion tree has already been visited but has not been resolved should be enough. Normally, dynamic programming is used to reduce computation time by reusing values that are computed in different recursion trees. But in our case, once the set of station for a group A is known, the string representation of group A is replaced by this set (i.e. the node becomes terminal). Hence, when the set of stations is computed for another group B, that uses group A in its definition, the set for group A is already known. If the same group is used twice in the call stack of a recursive resolution, we consider it an error.

malex984 commented 7 years ago

I am not sure about how to understand "not": e.g. what not station1 or not station1 and not station2 would mean?

Also i do not like absence of underscores in IDs... is that intentional? Somehow i assumed IDs to be just valid variable identifiers...

porst17 commented 7 years ago

Allow -_ characters in IDs and fix white space in front of not:

disjunction = disjunction , or , disjunction | conjunction ;
conjunction = term , and , term | conjunction ;
term = not , id | id ;
and = ( " A" | " a" ) , ( "N" | "n" ) , ( "D " | "d " ) ;
or = ( "," | ( " O" | " o" ) , ( "R " | "r " ) ) ;
not = ( "N" | "n" ) , ( "O" | "o" ) , ( "T " | "t " ) ;
id = ( character | digit | special ) , { character | digit | special } ;
character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
          | "H" | "I" | "J" | "K" | "L" | "M" | "N"
          | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
          | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
          | "c" | "d" | "e" | "f" | "g" | "h" | "i"
          | "j" | "k" | "l" | "m" | "n" | "o" | "p"
          | "q" | "r" | "s" | "t" | "u" | "v" | "w"
          | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
special = "-" | "_" ;

porst17 commented 7 years ago

Regarding not: Before groups are specified, you already know about all available stations. If there exist station1, station2 and station3, then not station1 would be station2, station3 and not station1 and not station2 would be (station2 or station3) and (station1 or station3)=station3.

elondaits commented 7 years ago

I would say that

a,b,c = a ∪ b ∪ c

and

a, b, not c, d = ((a ∪ b) - c) ∪ d

... so "not c" is equivalent to -. It is true that

a, b, not c, c = a ∪ b ∪ c

and

a, b, c, not c = a ∪ b

but that's the kind of problem that include/exclude schemes in commands such as rsync have.

Likewise

a and b and c = a ∩ b ∩ c

a and b and not c = (a ∩ b) - c

porst17 commented 7 years ago

Definitions

∪: set union
∩: set intersection
-: set difference
complement(a): set complement with respect to the universe (aka all known stations)
∅: empty set

I would say that
a,b,c = a ∪ b ∪ c

true

and

a, b, not c, d = ((a ∪ b) - c) ∪ d

That's not true. We defined , to be equivalent to or. Also, the usual operator precedence is not, and, then or. Therefore,

a, b, not c, d = a ∪ b ∪ not c ∪ d

I would also say, that not is the absolute set complement (not c = complement( c ) = all_station - c). Relative set complement aka set difference (a minus c = a - c) is currently not defined in our grammar. I'd rather prefer to avoid a - c in favor of a minus c (because - is used in YAML).

In my definition, your example would yield

a, b, not c, d = a or b or not c or d
               = a ∪ b ∪ complement( c ) ∪ d
               = a ∪ b ∪ all_stations - c ∪ d

Assume your definition of of not as a set difference. If I add parentheses to your example, a, b, not c, d = (((a, b), ) not c), d = would not be a legal expression. The second , would be missing its second argument.

... so "not c" is equivalent to -.

What do you mean by -? -c (set difference)? Empty set?

It is true that

a, b, not c, c = a ∪ b ∪ c

and

a, b, c, not c = a ∪ b

I'd rather say

a, b, not c, c = a ∪ b ∪ complement( c ) ∪ c
               = a ∪ b ∪ c ∪ complement( c )
               = a, b, c, not c

i.e. ,(=OR) is associative as in set theory.

but that's the kind of problem that include/exclude schemes in commands such as rsync have.

Likewise
a and b and c = a ∩ b ∩ c
a and b and not c = (a ∩ b) - c

Again,

a and b and not c = a ∩ b ∩ complement( c )
                  = a ∩ b ∩ (all_stations - c)
                  = ( a ∩ b ∩ all_stations ) - c
                  = ( a ∩ b ) - c

i.e. it's consistent with the above definition.

Also, in your definition

a, not a = a minus a = ∅
a and not a = a minus a = ∅

which shouldn't happen. a, not a = a or not a should yield all possible stations and indeed, it does in my interpretation:

a, not a = a or complement( a )
         = a ∪ ( all_stations - a )
         = ( all_stations ∪ a ) - ( a - a )
         = all_stations - ∅
         = all_stations

PS.: Please use the quote feature (>) only for quotes, not for code or math. Otherwise, if I want to quote you and add comments, it is not clear anymore who wrote what (ambiguous).

PPS.: This post was written in a hurry. I hope I made no mistake in my (simple) math.

malex984 commented 7 years ago

Thank You Christian! Frankly it was also my understanding from our discussion that NOT is a set subtraction rather than a set complement.

Now i can see that they are mostly equivalent: A \setMinus( B ) == A ∩ \setComplement( B ) except for the case of standalone "NOT A" which only makes sense if NOT stays for a set complement.

@porst17 Note that adding a new station changes the total set (all_stations) and thus changes all standalone complements, which means that many group definitions will have to be manually checked/updated by user after adding new stations if we choose to allow standalone set complements.

I propose to forbid terms like NOT groupId but allow complements as a part of any intersection (e.g. A NOT B & NOT A AND B). Maybe even replace NOT with something like EXCLUDE (or WITHOUT or MINUS).

elondaits commented 7 years ago

@porst17 : I don't disagree with your math, I also agree your interpretation is reasonable and perhaps more reasonable when you consider "or" instead of "," (although I know and never disagreed with them being equivalente). We were simply thinking of different semantics.

... but I disagree with the usefulness of "not" if it's meant to be "complement".

... because then

a, b, not c, d = a or b or not c or d
               = a ∪ b ∪ complement( c ) ∪ d
               = a ∪ b ∪ all_stations - c ∪ d
               = all_stations - c

And any use of "not" basically adds "all stations"... so it's only useful for lists such as:

not a, not b, not c

which we shouldn't support, because you should NEVER indicate your app is compatible with "all stations except some" because you don't know which stations might be added in the future and can't be sure they'll be compatible. So it's dangerous and bad practice.

So we can simply disallow "not" except in conjunctions

a ∩ not b 
= a - b

or change the syntax or whatever is needed so that individual stations and groups can be excluded from an enumeration (i.e. conjunction). More like the original intention of being able to write things such as:

TOUCH, MULTITOUCH, !LEAP, !TEST

without having to do something like

TOUCH and not LEAP, TOUCH and not TEST, MULTITOUCH and not LEAP, MULTITOUCH and not TEST

or writing more than one group definition.

elondaits commented 7 years ago

PS: If the app compatibility is:

KIOSK and WIDESCREEN, STATIC, PROJECTOR

then:

It should be possible to exclude a singe station (or group) without the need to know in which of those groups its included.
It should be possible to exclude it with a single instruction (not rewriting many definitions)
If excluding more than one station, the same, it shouldn't require the complexity of the expressions to increase

An alternative if you want pure set operations instead of enumeration semantics is to do like rsync and change the compatibleStations to two keys:

compatibleStations incompatibleStations

so the app can be run on any station that is in the compatible list and not in the incompatible list. This, I think, allows us to remove the "not".

porst17 commented 7 years ago

My point was to show that our current definition has certain problems. If we rely on actual set operations, everything is clearly defined as long as we choose operator names wisely to avoid confusion.

I am not saying that this is the route to go. But @elondaits' examples revealed some of the ambiguities and inconsistencies of the current approach.

And like @elondaits said many times: If we don't understand it, the user will neither.

porst17 commented 7 years ago

To be constructive again: What about defining groups like in my very first post?

groups:
    - group1: [ station2, profile2 ] # = [ station1, station2 ]
    - group2: [ station3 ] # = [ station3 ]
    - group3: [ group1, station4 ] # [ station1, station2, station4 ]

This will be the most likely use case in my opinion.

And then allow another syntax:

    - group4: { include: [ group3, station5 ], exclude: station1 } # = [ station2, station4, station5 ]

In each include: and exclude:, you could use any group definition scheme again, e.g.

{ include: { include: {...}, exclude: [ ..., ... ] }, exclude: {...} }

would be valid. The YAML parser takes care of preserving the hierarchy and distinguishes between {} and [] notation (i.e. in your code you would just need to check if a list or a map has been used).

I think, it makes sense to define that include: always applies first, then exclude:.

This feels rather intuitive to me and is very similar to what people (sys admins and devs) are used to. In contrast, the usual set theory just feels weird in the context of configuration files.

porst17 commented 7 years ago

BTW: [ a, b, c, d ] would be the shorthand for { include: [ a, b, c, d ] } resp. { include: [ a, b, c, d ], exclude: [] }

malex984 commented 7 years ago

What about the following YAML: [ a, +aa, + aaa, ~m, ~ mm, [A], +: [AA], + : [AAA], ~: [M], ~ : [MM] ], it is equivalent to the following python:

['a',
 '+aa',
 '+ aaa',
 ['A'],
 {'+': ['AA']},
 {'+': ['AAA']},
 '~m',
 '~ mm',
 {None: ['M']},
 {None: ['MM']}]

which means that we can interpret it and turn into { include: [a, aa, aaa, A, AA, AAA], exclude: [m, mm, M, MM] }.

malex984 commented 7 years ago

In my sample above:

[A, ... ] is a shortcut for +: [A, ...] and means a union of items in the list A, ...
[A, ~: [ B ], ~ C ] may mean subtraction A \minus B \minus C
additionally ^: [ A, B, ... ] may represent an intersection of the items in the list A, B, ...

Update: one can also emulate intersection due to the usual trick ^:[A, B] == [ A, B, ~:[A, ~B], ~:[B, ~A] ], but i imagine already [A, B, C, ~:[A,B,~C], ~:[B,C,~A], ~:[A,C,~B] ] is a valid but not a very good replacement for ^:[A,B,C]...

Oh, looks like i came to something like reverse polish notation and tree representation again :-/ Sorry but I still would prefer any YAML structure over strings with custom grammar to parse...

porst17 commented 7 years ago

@malex984 The reason why I used include and exclude was, that everybody understands it. I doubt this is the case with +, ~ and ^ without reading the manual.

If you really want to support intersection explicitly, then I would prefer a human readable keyword like intersectWith.

What you posted is also not reverse polish notation. Because in [A1, ... , An, ~:[B1, ..., Bm]], the first argument to ~ is [A1, ... , An] and the second is [B1, ..., Bm].

Also, your tree is not obvious, because you actually have a list you interpret as a tree using certain additional knowledge. In my {} notation, the tree is explicitly defined by the curly braces.

Next, your notation doesn't look complete or consistent to me: What does [A, ~:[B,C]] mean? And what about [A, ^:[B,C]]? Your operator notation for ^ seams to be reverse polish, but for ~ it is not.

In my opinion, your new notation has the same problems our previous and, or, not notation had, but you added hierarchies to the mix, which doesn't solve anything that couldn't be done before (by defining several groups one after the other), but rather leads to even more confusing notation.

porst17 commented 7 years ago

In order to come to an end, I would say we stick to

[ G1, ..., Gn ] equals G1 ∪ ... ∪ Gn (set union)
{ include: G_include, exclude: G_exclude, intersectWith: G_intersect } is equivalent to (G_include - G_exclude) ∩ G_intersect = (G_include ∩ G_intersect ) - G_exclude, i.e. the order in with exclude and intersectWith are executed doesn't matter, but include always comes first

The tree is explicit, you don't need to parse strings except for comparing against the three keywords. You do not even need to trim whitespace. The YAML parser will do this for you. Determining the content of each groups if a trivial recursive algorithm that also checks for cycles in the definition.

Of course, there are different ways in YAML to write down [...] and {...}, but the YAML parser with turn that into lists and maps and that is the only thing you need to deal with in your code.

malex984 commented 7 years ago

In my opinion, your new notation has the same problems ... but you added hierarchies to the mix, which doesn't solve anything ...

@porst17 I only tried to extend https://github.com/hilbert/hilbert-cli/issues/15#issuecomment-255829052 and provide shortcuts for all operations. Evolution was something like: [a, b] -> [a, b, ~d, ~e] -> [a, [b, c], ~d, ~: [e,f]] -> +: [+a, ~d, +: [b, c], ~: [e, f] ]. Insertion was added by analogy to the union operation.

Sorry for confusion! I actually meant the usual polish notation with 3 operations as follows:

Intersection '^:[a,,,]', where arguments are either groups or unions,
Union '+:[a,,,]`, where arguments are either groups or unions or the following:
- Exclusion from within a union: '+:[,,,~:[a,,, ],,, ~:[b,,,],,,]', which indicates the subtraction from the enveloping union of all items (e.g. groups, unions or intersections) marked for exclusion (e.g. a,,, b,,,)

Note: + and +: are assumed if nothing given, e.g. [a, [b, c], d] == +: [+a, +:[b, c], +d].

What does [A, ~:[B,C]] mean?

[A, ~:[B,C]] = { Include(A), Exclude(B, C)}. More generally: [A, [B, C], ~D, E, ~: [F, G], H ] == Include(A, B, C, E, H), Exclude(D, F, G)

And what about [A, ^:[B,C]]?

[A, ^:[B,C]] == Include(A, IntersectionOfAllArguments(B, C))

porst17 commented 7 years ago

[A, ^:[B,C]] == Include(A, IntersectionOfAllArguments(B, C))

makes no sense to me. It looks like ^ takes a group as its argument again. This would mean, that in

A: [station1]
B: [station2,station3]
C: [station3,station4]
group1: [B,C] # = [station2, station3, station4]
group2: [A, ^: [B,C]] # = [station1, station3]
group3: [A, ^: group1] # = [ station1, ^: [station2, station3, station4]]
                       # = [station1, []] = [station1]

the evaluation of the groups groups2 and group3 would yield different results which can't be our intention. If ^ does not take a group argument, then we are using the [] notation differently in the same context, which is equally confusing.

@malex984 @elondaits Are there any solid arguments against my proposal in https://github.com/hilbert/hilbert-cli/issues/15#issuecomment-255960886? If not, please close this issue and move forward.

elondaits commented 7 years ago

I hadn't commented on the proposal because I wanted to dedicate some time to thing it through. But my thoughts so far:

I wouldn't add "intersect" yet. I'm not sure it's useful and it's still too complicated conceptually. Include and exclude are very clear as far as you know in which order they're applied (that's the main problem when you specify patterns in include and exclude options in rsync and similar). Intersect can always be solved by the user by explicitly defining a new group and we could add it in the future in a backwards compatible manner if we find it's necessary or if we find a simpler way to express it.
I would keep the idea (I don't know if this was your idea as well) of being able to list the stations or groups in the app cfg key without needing to create a named group. E.g:

  my_app:
    compatibleStations: [s1, s2, s3]

or

  stationGroups:
    G1: {include: [s1,s2,s3]}
    G2: [s1, s4, s5]
  my_app:
    compatibleStations: {include: [G1, G2], exclude: [s2]}

malex984 commented 7 years ago

@porst17 I see, yes my shortcut for intersections seems misleading :-/

I agree with your proposal https://github.com/hilbert/hilbert-cli/issues/15#issuecomment-255960886, it just seemed to me that your proposed syntax is a bit heavy. I hope your shortcut notation may be used everywhere as @elondaits elaborated at the end of https://github.com/hilbert/hilbert-cli/issues/15#issuecomment-256085106.

@porst17 What about the recursive usage? Consider for example:

stationGroups:
  G1: [s1, [s2, s3], s4] # Or should one create some temporary Group ID for `[s2, s3]` and use `[s1, GroupID, s4]` instead?
  G2: [GroupID] # same as `GroupID`?
  G3: [[GroupID]] # same as `GroupID`?
  G4: {include: GroupID, exclude: [[[GroupID]]]} # same for include/exclude fields?
  G5: [ { include: ..., exclude: ... }, { include: ..., exclude: ...} ] # list of mappings?

porst17 commented 7 years ago

@elondaits Intersection is useful because it reflects your idea of capabilities. I know, intersection can be emulated by the other operations, but this is too tedious. I would like to leave it in. We could argue about giving it a better name, but if no one of us comes up with a better name until tomorrow (that also received two up-votes by the other team members), we keep intersectWith and move on. We can always deprecate it later and give it another name (while still supporting the old name).

@malex984 The proposed notation does everything we need, is kind of self-explaining except for the order of evaluation (include always comes first), is extremely easy to parse (YAML parser does most of the job + 1 recursive function), allows inline definition of groups (as @elondaits and you demonstrated). It suffices for now and a shorthand notation can always be added later. If we have time later we can implement a fancy parser that parses

groups:
  - ....
  - myGroup: (A ∪ B) ∩ (C - D) ∪ [E, {include: station3, exclude: [F,G]}]

with parentheses (i.e. non regular grammar) and unicode symbols ∪∩. But for now, my proposed notation does the job just fine and we should not spend more time on this relatively minor issue (compared to the whole project).

Let's wait for the replacement proposals for intersectWith and then close this discussion. Please add your upvote or downvote reaction (do not abstain from the voting).

porst17 commented 7 years ago

I propose to keep intersectWith.

elondaits commented 7 years ago

I don't like intersectWith because it suddenly makes the user start thinking of set operations and it's not too clear how it works... I had to read the semantics to understand what it does and how it interacts with the other two.

From above:

{ include: G_include, exclude: G_exclude, intersectWith: G_intersect } 
is equivalent to 
(G_include - G_exclude) ∩ G_intersect = (G_include ∩ G_intersect ) - G_exclude

so if I want to limit the program to test windows stations that have a Leap and Sound:

WINDOWS ∩ TEST ∩ LEAP ∩ SOUND

{include: WINDOWS, 
  intersectWith: {
    include: TEST, 
    intersectWith: {
      include: LEAP, 
      intersectWith: SOUND
    }
  }
}

but it gets a bit out of hand and is not very readable. And also, I can't help thinking some people will expect this to be equivalent (and then wonder why it doesn't work):

{include: WINDOWS, intersectWith: [TEST, LEAP, SOUND]}

because I though this might be a possibility before reading the semantics.

So a more complex idea that makes the semantics clearer and the syntax more friendly is:

{include: <group>
 exclude: <group>
 ifIn: <group> // same as intersectWith
 ifInAll: <array of groups>
}

so the case above would simply be:

{include: WINDOWS, ifInAll: [TEST, LEAP, SOUND]}

BUT since this is taking too much time and since intersectWith is just a way to write things shorter and not a change in the final result (which is just a set of stations in the end) I'm giving a thumbs up to the previous proposal so you're free to move ahead and we can check if this idea has any problems at the semantic level, since it's simple to implement but harder to think about).

malex984 commented 7 years ago

Wow! Good catch @elondaits! It seems to be exactly like with my intersection shortcut noted by @porst17 in https://github.com/hilbert/hilbert-cli/issues/15#issuecomment-256082434

@elondaits Sorry, i do not see how ifInAll: [TEST, LEAP, SOUND] solved the problem of intersectWith: [TEST, LEAP, SOUND] with [TEST, LEAP, SOUND] being a union due to shortcut notation?

I would say intersectWith can only allow a single string (group_ID) as an argument (instead of arbitrary group) to avoid confusion...

ps: that limitation could may solve the problem with my intersection shortcut as one can repeat items (for intersection) within a list: [A, ^: groupID1, B, ^: groupID2, C ] == [ include: A, intersect: groupID1, include: B, intersect: groupID2, include: C ] == (A + B + C) ∩ (groupID1) ∩(groupID2)

porst17 commented 7 years ago

Thanks, @elondaits, for your comments. While reading it, I finally understood why our current approaches are all misleading in one way or the other: We are abusing YAML sequences ([]) as sets and we expect them to work differently in similar contexts, which is very confusing, of course.

I though about it and looked at the YAML specification again (actually, examples). It turned out that YAML has a data type for sets and it is obvious and I don't know why we missed it so far: The key sets of maps!

Very long:

set:
  ? element1
  : null
  ? element2
  : null
  ? element3
  : null

Still long:

set:
  element1: null
  element2: null
  element3: null

Shorter (this is the official notation for sets):

set:
  ? element1
  ? element2
  ? element3

Very short (and best for our purpose):

set: { element1, element2, element3 }

The YAML parser will turn each of the above into this canonical YAML:

!!str "set"
  : !!map {
    ? !!str "element1"
    : !!null "null",
    ? !!str "element2"
    : !!null "null",
    ? !!str "element3"
    : !!null "null",
  }

Set properties (in YAML and in mathematics):

each element occurs only once and if an element is listed more than once, all occurrences except for one will be ignored
the order of elements is ignored.

These things are not true for YAML sequences []. Also, {} is the mathematical notation for sets by the way.

By switching from [] to {}, @elondaits's example would finally make sense:

- {include: WINDOWS, intersectWith: [TEST, LEAP, SOUND]}
- {WINDOWS, intersectWith: [TEST, LEAP, SOUND]} # same thing

would be

WINDOWS ∩ TEST ∩ LEAP ∩ SOUND

Inline definition of groups would stay mostly the same:

- {a,b,d}
- {a,b,c,d,exclude: c}
- {a,b,c,d,e,exclude: [c,e]}
- {include: {a,b,c,d,e},exclude: [c,d]}

would all be valid to define {a,b,d} (for disjoint a, b, c, d, e). Since [] and {} have different meanings, these two things would give different results (as intended):

- {a,b,c,intersectWith: {a,b}} # = {a,b,c} ∩ ( a ∪ b )
- {a,b,c,intersectWith: [a,b]} # = {a,b,c} ∩ ( a ∩ b )

The only drawback of the {} notation I see is that it can't be nested, so {a,b,c,{d,e}} is not valid YAML, but {a,b,c,include:{d,e}} is. But why would you use {a,b,c,{d,e}} if you could just write {a,b,c,d,e}? So not being able to nest is not a big deal.

I highly recommend to implement this (hopefully last) change, i.e. using the right ({}) instead of the wrong ([]) notation for YAML sets.

Side note: Reserved words (include et. al. ) can also be used as group names, because their associated value would be null. I wouldn't recommend that to users, though. It's easy to mess up.

malex984 commented 7 years ago

@porst17 Did i get it right that the shortcut for union is a set notation {a, b, ...} instead of sequence [a, b, .... ]? Moreover a sequence ([...]) is to be only used as a value of intersectWith and exclude keys to indicate individual arguments for intersection & subtraction?

But using sequence of arguments for include seems to be also consistent with your definition: {d, include: [a, b], c} may be equal to {d, include: {a, b}, c} == {a, b, c, d} (for valid IDs a, b, c, d)

Side note: if include or exclude or intersectWith key appears multiple times within the same mapping then the later entry will always overwrite the former one, but due to (#27) we can warn about such cases.

elondaits commented 7 years ago

I'm OK with using the set notation, but not with this:

{a,b,c,intersectWith: {a,b}} # = {a,b,c} ∩ ( a ∪ b )

{a,b,c,intersectWith: [a,b]} # = {a,b,c} ∩ ( a ∩ b )

... not because the concept is wrong, but because if you write [ instead of { (or viceversa) your set definition will not be right and you'll never understand why. It's a very small difference, visually. Here I'm just thinking of the "usability" / "friendliness" of the notation.

Also my proposal used two different keywords (ifIn and ifInAll, which would be like "intersectWith" and "intersectWithEach") on purpose, because some modern languages / authors recommend not overloading a method for basically different types (in this case, a set and a collection of sets). This is not a method but in a way it's like a function / constructor. Once again, this is a design thing to improve readability / comprehension, not a problem with the underlying mechanism.

... Aaaalso... in the "intersectWith: [a,b]" variant we would be using a list of sets to differentiate the cases, but the order of the sets is not really important so conceptually it would take a set of sets, not a list/array of sets. So here [] is used for syntactic convenience, unlike {} which was proposed for conceptual reasons.

malex984 commented 7 years ago

I propose to

disable recursive definitions: only string GroupIDs may be used to define a new group
extend the set union shortcut { a, b, ... } to enable subtraction and intersection.

Possible extension may go as follows:

recall that in YAML {a, b, c} == {?a, ? b, ? c} but ? with spaces will be removed by parser
recall that the order of items within a set is ARBITRARY (YAML parser may rearrange keys at will)
note that some symbols (e.g. . , _ , + , ^ , = , $ , ~ , \ , /) or Unicode symbols may be used as prefixes to indicate that the following string (GroupID) is not part of include section.

For example: {a, ?b, \S, ? c, \ T, d, ∩I, e, ∩ J, f} would mean ((a ∪ b ∪ c ∪ d ∪ e ∪ f) ∩ I ∩ J ) \ S \ T, if \ indicates set subtraction (which is also the standard notation) and for instance ∩ may mark intersection items. Note that order of items must be irrelevant!

Corresponding YAML dump (note difference in the order of items):

{
  "a": null,
  "c": null,
  "b": null,
  "e": null,
  "\\S": null,
  "f": null,
  "∩I": null,
  "∩ J": null,
  "\\ T": null,
  "d": null
}

Note that above will not let one to define an intersection with a set, like a∩(I ∪ J), but if it is really necessary: we may additionally choose another indicator for items of intersection union, e.g. {a, \b, ∩c, ∩d, ∩∪E, ∩f, ∩∪G, h} = ((a ∪ h) ∩ c ∩ d ∩ (E ∪ G) ∩ f) \ b. Corresponding dump:

{
  "a": null,
  "h": null,
  "\\b": null,
  "\u2229\u222aE": null,
  "\u2229 d": null,
  "\u2229c": null,
  "\u2229\u222a G": null,
  "\u2229f": null
}

malex984 commented 7 years ago

BTW, my proposal above is just a shortcut for { include: {a, h}, exclude: {b}, ifIn: {c, d}, ifInAll: {E, G} } if we eliminate all sequences [...] from a group definition.

malex984 commented 7 years ago

elondaits To me using the set shortcut {a, c, d} (for a union) in parallel with { include: {a}, ifIn: {c, d} } for an intersection is still confusing.

Therefore we better choose only one way: either some extended shortcut notation OR always ask for mappings: { include: {a, ...}, exclude: {b, ...}, ifIn: {c, ...}, ifInAll: {d, ...} }. Users should NOT be able to mix them together!

IMO: user will need to read some manual for defining station groups anyway in any case!

porst17 commented 7 years ago

@elondaits

I'm OK with using the set notation, but not with this:

{a,b,c,intersectWith: {a,b}} # = {a,b,c} ∩ ( a ∪ b )

{a,b,c,intersectWith: [a,b]} # = {a,b,c} ∩ ( a ∩ b )

... not because the concept is wrong, but because if you write [ instead of { (or viceversa) your set definition will not be right and you'll never understand why. It's a very small difference, visually. Here I'm just thinking of the "usability" / "friendliness" of the notation.

People who write config files for programs should be able to distinguish [] and {}. To be on the safe side, we could require all operations to take only a sequence as argument, i.e. the above example would become:

- {a,b,c,intersectWith: [{a,b}]} # = {a,b,c} ∩ ( a ∪ b )
- {a,b,c,intersectWith: [a,b]} # = {a,b,c} ∩ ( a ∩ b )

It's a little longer, but the parser would be able to indicate intersectWith: {a,b} as an error.

Also my proposal used two different keywords (ifIn and ifInAll, which would be like "intersectWith" and "intersectWithEach") on purpose, because some modern languages / authors recommend not overloading a method for basically different types (in this case, a set and a collection of sets). This is not a method but in a way it's like a function / constructor. Once again, this is a design thing to improve readability / comprehension, not a problem with the underlying mechanism.

Solved by requiring sequence-type arguments for intersectWith.

... Aaaalso... in the "intersectWith: [a,b]" variant we would be using a list of sets to differentiate the cases, but the order of the sets is not really important so conceptually it would take a set of sets, not a list/array of sets. So here [] is used for syntactic convenience, unlike {} which was proposed for conceptual reasons.

The conceptional problem is with recursive set definitions. With

- group1: {a,b}
- group2: {b,c}

we interpret {group1,group2} = {a,b,c}, while in mathematics {group1,group2} = {{a,b},{b,c}}, i.e. the data types don't match. You do not want to introduce YAML types (!!type), so we have to distinguish the different cases else-wise. While technically, arguments to intersectWith et.al. are sets of sets, we can not distinguish between recursive group arguments (resolved into a flat set) and set of sets (hierarchy is preserved). We have to decide how to resolve the hierarchy in a group definition. It must be the same for all operations, being it include, exclude or intersectWith. Therefore, I used {} for sets that should be recursively flattened, while [] is used to denote hierarchy-preservation. Yes, this is differentiation by syntax.

porst17 commented 7 years ago

@malex984 I find {a, ?b, \S, ? c, \ T, d, ∩I, e, ∩ J, f} hardly readable and would rather focus on get the long version right before trying to add shorthands again. And the shorthands can be added much later in the project if they are backwards-compatible.

porst17 commented 7 years ago

@malex @elondaits I am deciding it now:

group_def: { group_def, # \
             ...,       #  > synonym for 'include: [group_def, ..., group_def]'
             group_def, # /
             include: group_def_or_sequence_of_group_def,
             exclude: group_def_or_sequence_of_group_def,
             intersectWith: group_def_or_sequence_of_group_def }
group_def_or_sequence_of_group_def: [group_def,...,group_def]

Each of the set/sequence elements is optional, i.e. {} and [] are perfectly valid. If you need it more formal, I can do it, but I hope it is clear. Being very formal takes a lot more time to write it down, which I don't really have right now.

The config file format requires a version tag (see https://github.com/hilbert/hilbert-cli/issues/17#issuecomment-255210272). If it really turns out that the above way of defining the groups is messy, cumbersome, confusing, inconsistent or whatever and we cannot fix it in a backwards-compatible manner, we can do config v2 and provide a converter from v1 to v2. Writing the converter probably requires less time than this discussion. 😉

elondaits commented 7 years ago

I think this issue should be closed with the commit of some documentation or commented sample config. Having the spec as issues is messy and if it changes or we find a problem when we start implementing we end up programming against several issues.

porst17 commented 7 years ago

@elondaits Valid point. @malex984 Please close once there is a commit to reference to.

malex984 commented 7 years ago

Current state: only initial parsing of Group's is in place (see tests/test_validate.py) but

the actual computation of the set of stations is not yet implemented,
unit tests are missing

malex984 commented 7 years ago

Note: missing functionality will be implemented within #14

hilbert / hilbert-cli

Configuration design: Reflect n:m relation between stations and applications in the configuration #15