pre proccessing policy for general conns

shireenf-ibm commented 8 months ago

issue : #236 sub-task:

- [ ] optimize policy pre-processing : store data on policy's general conns (to whole world/ entire cluster)more info

shireenf-ibm commented 8 months ago

Some notes:

1. main changes:

- step1 : while upserting a new network-policy, scan its ingress and egress rules and store :

- all destinations (whole world) connections
- entire cluster connections

notes on step1:

can not do the pre-process only for exposure analysis; since policy-engine's exposure analysis flag is updated only after upserting the objects (the pe.exposureAnalysisFlag is updated after pe.NewPolicyEngineWithObjects which calls UpsertObject funcs)
currently, I stored both entire world and entire cluster conns (entire cluster contains also entire world) although I use only entire cluster;

in the mean while kept the entire-world connection cause we may use it in next optimizations; if we don't will eliminate it.
a rule may contain namedPort; in pre-process we can not convert the namedPort into portNum (we don't have all peers yet, neither know what peers are selected); so :
1. on pre-process; I store the new connection with the namedPort (however; here I found a bug in PortSet(detailed below))
2. while computing allowed connections between peers (scanning the policy rules for finding selected peers;), each time a namedPort was converted, saved it in the policy struct; (a policy object has a map from protocol to map from namedPorts to port numbers)

- step2: while scaning the policies selecting a pod for allowed connections computations; updated the pod's general exposure data from the policy's stored data.

(when updating the pod's data, checked and replaced namedPort in the poliy's entire cluster conns using the map; (worked with local test with namedPort))

2. notes on PortSet :

although PortSet has a
```
NamedPorts         map[string]bool
ExcludedNamedPorts map[string]bool
```
they were never used before and never initialized; so while running a test with a namedPort I got : panic: assignment to entry in nil map error; I updated the MakePortSet with initialized empty maps; but still got the error; since almost always a new PortSet is initialized by PortSet{}; so currently in the PortSet funcs that are used for storing the namedPort; I check and initialize the maps if are nil

shireenf-ibm commented 8 months ago

TODO: (for next commit?) if new data of the Policy is accepted :

we can benefit from it for computing allowed conns; I suggest that the netpol func ruleSelectsPeer checks only if there is a specified rule selecting the specific peer; (specified podSelector/ NamespaceSelector ...) and then for-example : the allowed conns between peers is the entireCluster.Union(this-rules-conn) for IPblock the policy's entireWorldConn.Union(specific IPBlock rule conn)

adisos commented 8 months ago

some questions/comments:

can not do the pre-process only for exposure analysis; since policy-engine's exposure analysis flag is updated only after upserting the objects

I think we can find solutions for this, such as adding more api funcs to generate policy engine as desired.

while computing allowed connections between peers (scanning the policy rules for finding selected peers;), each time a namedPort was converted, saved it in the policy struct

why does it need to be stored in the policy?

what if the connection from actual pod to a "fake" (representative) pod depends on a named port? how will this be handled by exposure analysis?

I suggest that the netpol func ruleSelectsPeer checks only ...

not sure I understand the entire sentence, though I think ruleSelectsPeer could avoid iterating all rules in case we already know from pre-processing (when available, from exposure analysis activation) that at least one of the rules is matching all namespaces or any pod/ip-block .. (for this case we do need the "all destinations (whole world) connections").

shireenf-ibm commented 8 months ago

some questions/comments:

can not do the pre-process only for exposure analysis; since policy-engine's exposure analysis flag is updated only after upserting the objects

I think we can find solutions for this, such as adding more api funcs to generate policy engine as desired.

True, so do you suggest performing this pre-process only for exposure analysis ?

while computing allowed connections between peers (scanning the policy rules for finding selected peers;), each time a namedPort was converted, saved it in the policy struct

why does it need to be stored in the policy?

True, actually I was very unsure when and how to store these; additionally, if we want to benefit from the general conns in ruleSelectsPeer ; we might not get the converted named port for the general rules.

I think also that I don't fully understand when a named port should be converted..

but if it is a "src" (egress conn); should we convert? or keep it as it? (a named port) (I would like to discuss this with you)

what if the connection from actual pod to a "fake" (representative) pod depends on a named port? how will this be handled by exposure analysis?

in this PR; focused on "general" (entire cluster) connections; my plan for optimizing PR: is that if the "dst" is representative and there is a conn with named port is to store the named port in the connection data (portSet.NamedPorts) I can also save potential namedPorts in the RepresentativePeer struct so we can use it in the output if we want

I suggest that the netpol func ruleSelectsPeer checks only ...

not sure I understand the entire sentence, though I think ruleSelectsPeer could avoid iterating all rules in case we already know from pre-processing (when available, from exposure analysis activation) that at least one of the rules is matching all namespaces or any pod/ip-block .. (for this case we do need the "all destinations (whole world) connections").

yes that is what I meant

shireenf-ibm commented 8 months ago

next tasks to be done are:

adding new api func to the policy-engine; so all pre-process (and storing general conns for pods) will be computed only for exposure-analysis.
in case of exposure-analysis; updating some computation that can benefit from the stored data of the policy

since this PR is already labeled L ; I think should commit these to a new PR ? (sub-task : * - [ ] adding new api func for policy-engine for running exposure-analysis individually)

adisos commented 8 months ago

next tasks to be done are:
* adding new api func to the policy-engine; so all pre-process (and storing general conns for pods) will be computed only for exposure-analysis.

* in case of exposure-analysis; updating some computation that can benefit from the stored data of the policy
since this PR is already labeled L ; I think should commit these to a new PR ? (sub-task : * - [ ] adding new api func for policy-engine for running exposure-analysis individually)

ok, let's have these in a new PR

np-guard / netpol-analyzer

pre proccessing policy for general conns #306