openservicemesh / osm

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.
https://openservicemesh.io/
Apache License 2.0
2.59k stars 277 forks source link

Multiple TrafficTarget on the same destination #5297

Closed stephaneey closed 1 year ago

stephaneey commented 1 year ago

Bug description: I have a very simple scenario (at least I thought it was easy), which is:

I cannot get this to work. OSM policy check-pods reports it is working but it's not.

Affected area (please mark with X where applicable):

Expected behavior: Be able to assign different permissions to client 1 and client 2 Steps to reproduce the bug (as precisely as possible): First deploy the service accounts:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: simpleapi  
  namespace: osmdemo
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bbox1  
  namespace: osmdemo
---  
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bbox2 
  namespace: osmdemo
---

next, deploy the two clients and the API. Each one refers to their corresponding SA:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bbox1  
  namespace: osmdemo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bbox1
  template:
    metadata:
      labels:
        app: bbox1
    spec:          
      containers:
      - name: bbox1
        image: governmentpaas/curl-ssl
        command: ["/bin/sleep", "30d"]
        imagePullPolicy: IfNotPresent        
      serviceAccountName: bbox1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bbox2  
  namespace: osmdemo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bbox2
  template:
    metadata:
      labels:
        app: bbox2
    spec:          
      containers:
      - name: bbox2
        image: governmentpaas/curl-ssl
        command: ["/bin/sleep", "30d"]
        imagePullPolicy: IfNotPresent        
      serviceAccountName: bbox2
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: simpleapi  
  namespace: osmdemo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: simpleapi
  template:
    metadata:
      labels:
        app: simpleapi
    spec:      
      serviceAccountName: simpleapi     
      containers:
      - name: simpleapi
        image: stephaneey/osmapi:dev       
        imagePullPolicy: IfNotPresent
---

If permissive mode is enabled, all calls work fine. By all calls, I mean:

curl -X DELETE -v http://simpleapisvc.osmdemo.svc.cluster.local/osm/1
curl -v http://simpleapisvc.osmdemo.svc.cluster.local/osm

Next disable enablePermissiveTrafficPolicyMode in the global mesh config. Calls don't work anymore, which is expected. Next, apply the following TrafficTarget and HTTPRouteGroup:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: simpleapiroutes
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    methods: ["*"]
  - name: getonly
    methods: ["GET"]     
---
apiVersion: access.smi-spec.io/v1alpha3
kind: TrafficTarget
metadata:
  name: simpleapisvcbbox1
  namespace: osmdemo
spec:
  destination:
    kind: ServiceAccount
    name: simpleapi
    namespace: osmdemo
  rules:
  - kind: HTTPRouteGroup
    name: simpleapiroutes
    matches:
    - everything
  sources:
  - kind: ServiceAccount
    name: bbox1
    namespace: osmdemo  
---
apiVersion: access.smi-spec.io/v1alpha3
kind: TrafficTarget
metadata:
  name: simpleapisvcbbox2
  namespace: osmdemo
spec:
  destination:
    kind: ServiceAccount
    name: simpleapi
    namespace: osmdemo
  rules:
  - kind: HTTPRouteGroup
    name: simpleapiroutes
    matches:
    - getonly
  sources:
  - kind: ServiceAccount
    name: bbox2
    namespace: osmdemo  
---  

From now on, only calls from bbox1 work. None of the calls work from bbox2. Looking at the envoy logs from the target API, I clearly see this:

{"time_to_first_byte":null,"method":"GET","start_time":"2023-03-20T16:41:47.181Z","response_code_details":"rbac_access_denied_matched_policy[none]","duration":0,"authority":"simpleapisvc.osmdemo.svc.cluster.local","user_agent":"curl/7.83.1","requested_server_name":"simpleapisvc.osmdemo.svc.cluster.local","protocol":"HTTP/1.1","upstream_cluster":"osmdemo/simpleapisvc|80|local","response_flags":"-","path":"/osm","bytes_sent":19,"request_id":"7a8d8ca4-192d-49f3-b70f-8de29cbc46c3","bytes_received":0,"response_code":403,"upstream_host":null,"upstream_service_time":null,"x_forwarded_for":null} but if I simply remove TrafficTarget simpleapisvcbbox1, then everything works...it finds my other policy that applied to bbox2. Long story short: it seems that only one of the TrafficTarget is really applied.

How was OSM installed?: AKS add-on

Bug report archive:

Environment:

OSM v1.2.3 is now available. Please see https://github.com/openservicemesh/osm/releases/latest. WARNING: upgrading could introduce breaking changes. Please review the release notes.

keithmattix commented 1 year ago

Hi, thanks for submitting this issue. If you're seeing this issue on AKS, could you please submit a support case so we can best prioritize your case and gather information securely?

phillipgibson commented 1 year ago

@stephaneey looking over your HTTPRouteGroup, the first thing that was obvious to me is the omission of the pathRegex attribute. Let's start there and include it. If you do not want to implement fine grain control of the layer 7 path to the api, then you can simply add pathRegex: ".*" to your manifest. An example for your simpleapiroutes HTTPRouteGroup would look like this

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: simpleapiroutes
  namespace: osmdemo
spec:
  matches:
  - name: everything
    pathRegex: ".*"    
    methods: ["*"]
  - name: getonly
    pathRegex: ".*"
    methods: ["GET"]

Please try to add the pathRegex attribute to your HTTPRouteGroups and test again. Please also checkout the traffic spec for HTTPRouteGroups here.

stephaneey commented 1 year ago

Hi @phillipgibson,

Thanks for your answer but it doesn't change anything. Pathregex is optional as I don't get any validation issue while not providing it. If it's required, then it should be marked as such in the CRD. In any case, I get the same result:

Now, I managed to narrow down the issue as it seems that the problem is only with the "matches" attribute in the TrafficTargetRules. Indeed, if I specify "everything" in both traffic targets, both bbox1 and 2 can do all operations, else bbox2 cannot do anything. I've tried to split into two different groups but it doesn't change anything. It seems that the issue is linked to the "matches" attribute of the rule. I looked into the code but I'd need to build my own version and put a breakpoint as there is too much code to spot it just like that.

stephaneey commented 1 year ago

@keithmattix for this issue, I'm just testing something on my own but I know this will be the type of scenarios that will be required in customer environments because it just makes sense. So, creating a ticket for my own environment is probably overkill.

keithmattix commented 1 year ago

@stephaneey Could you turn on debug logs for the osm controller (osmLogLevel in the meshConfig) and upload a copy of the logs here? We should be able to see the traffic rules via this log line: https://github.com/openservicemesh/osm/blob/v1.2.3/pkg/catalog/inbound_traffic_policies.go#L261

stephaneey commented 1 year ago

Hi @keithmattix ,

I've enabled the debug mode and this is what I can find when applying the policies:

{"level":"debug","component":"envoy/lds","time":"2023-03-31T06:37:08Z","file":"rbac.go:56","message":"RBAC policy for proxy with identity simpleapi.osmdemo: map[osmdemo/simpleapisvcbbox1:permissions:{any:true}
principals:{authenticated:{principal_name:{exact:\"bbox1.osmdemo.cluster.local\"}}} osmdemo/simpleapisvcbbox2:permissions:{any:true}
principals:{authenticated:{principal_name:{exact:\"bbox2.osmdemo.cluster.local\"}}}]"}

and here are the adjusted policies:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: simpleapiroutes
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    pathRegex: '.*'
    methods: ["GET","DELETE"]
  - name: getonly
    pathRegex: '.*'
    methods: ["GET"]     
---
apiVersion: access.smi-spec.io/v1alpha3
kind: TrafficTarget
metadata:
  name: simpleapisvcbbox1
  namespace: osmdemo
spec:
  destination:
    kind: ServiceAccount
    name: simpleapi
    namespace: osmdemo
  rules:
  - kind: HTTPRouteGroup
    name: simpleapiroutes
    matches:
    - everything
  sources:
  - kind: ServiceAccount
    name: bbox1
    namespace: osmdemo  
---
apiVersion: access.smi-spec.io/v1alpha3
kind: TrafficTarget
metadata:
  name: simpleapisvcbbox2
  namespace: osmdemo
spec:
  destination:
    kind: ServiceAccount
    name: simpleapi
    namespace: osmdemo
  rules:
  - kind: HTTPRouteGroup
    name: simpleapiroutes
    matches:
    - getonly
  sources:
  - kind: ServiceAccount
    name: bbox2
    namespace: osmdemo  
---  

Reading the logs, I would assume that both bbox1 and bbox2 are authorized to do everything...I could not find any error. Note that I wanted to send you the entire logs so I decided to stop and restart the OSM components, to restart from fresh logs. Since then, I can't get them log in "debug" mode anymore. They stick to log level "info".

Best Regards

keithmattix commented 1 year ago

To change the log level, you probably need to restart osm-controller. If you could attach the full debug-level logs, that would be super helpful. Thanks!

stephaneey commented 1 year ago

Hi Keith,

Thanks for your answer. I restarted the OSM controller multiple times but no luck, it's stuck to "info". In any case, is my scenario possible? Can I define multiple "clients" for the same API but with different rules? If yes, could this be documented somewhere,

Thanks Best Regards

keithmattix commented 1 year ago

Looking at the code, it should be possible; it's not apparent to me why it isn't working as expected. I really need debug logs to look into this further...let me try and reproduce on my end

stephaneey commented 1 year ago

Hello,

I've explored a little further and my conclusion is simple: only a single route match is permitted whatever traffictarget might be defined later on. A few examples to illustrate this:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: everything
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    pathRegex: '/osm.*'
    methods: ["*"]  
---

The above route will take priority over the below one, whatever traffictarget could be defined later on:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: getonly
  namespace: osmdemo
spec:
  matches:  
  - name: getonly
    pathRegex: '/osm'
    methods: ["GET"]     
---

If I use two different controllers in the same API, than I can get bbox1 and bbox2 working side by side:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: everything
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    pathRegex: '/osm.*'
    methods: ["*"]  
---
apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: getonly
  namespace: osmdemo
spec:
  matches:  
  - name: getonly
    pathRegex: '/values.*'
    methods: ["GET"]     
---

the above works fine but as soon as you target the same route, only route is taken into account. Another example where bbox1 and bbox2 work as expected:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: allafterosm
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    pathRegex: '/osm/.*' => notice the "/" character here
    methods: ["*"]  
---
apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: getonly
  namespace: osmdemo
spec:
  matches:  
  - name: getosmrootonly
    pathRegex: '/osm'
    methods: ["GET"]     
---

In this case, bbox1 bound to allafterosm is able to curl /osm/1 or /osm/ but not /osm

bbox2 can only curl /osm/

At last, these two routes also work:

apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: everything
  namespace: osmdemo
spec:
  matches:
  - name: everything    
    pathRegex: '/osm/.*'
    methods: ["*"]  
---
apiVersion: specs.smi-spec.io/v1alpha4
kind: HTTPRouteGroup
metadata:
  name: getonly
  namespace: osmdemo
spec:
  matches:  
  - name: getonly
    pathRegex: '/osm$' => notice the "$" character here forcing "m" to be the last char
    methods: ["GET"]     
---

Because they are different. But setting different verbs on /osm.* doesn't work...although this scenario should be made possible because it's likely to happen in the real world.

keithmattix commented 1 year ago

Thanks for the in-depth analysis! It's extremely helpful when someone gives us this level of detail, so a heartfelt thanks for that ❤️ Since there's a workaround, we probably will push a patch release out not including the fix to this bug and add this fix to the next patch release. Let me know if I'm misunderstanding!

stephaneey commented 1 year ago

Hi @keithmattix ,

No problem! However, right now, there is no real workaround in case two different API clients need to talk to the same API endpoint with different rules or if you simply want to define broad rules like this:

The only possible workarounds that I can think of are both unacceptable hacks:

1) Defining dedicated API endpoints per "client" but this is ugly since an API should be agnostic to its consumers 2) Keeping a single API (better) but running it with two different service accounts and two different K8s services. This works fine. It is less ugly than 1 but still not great.

So, knowing that 1 is definitely a dirty hack, if I take 2, it means:

- name: everything    
    pathRegex: '/osm/.*'
    methods: ["*"]  
- name: getonly
    pathRegex: '/osm/.*'
    methods: ["GET"]  

One traffictarget per client, targeting different destinations (simpleapi1sa and simpleapi2sa). This works but is far from being ideal of course.

github-actions[bot] commented 1 year ago

This issue will be closed due to a long period of inactivity. If you would like this issue to remain open then please comment or update.

github-actions[bot] commented 1 year ago

Issue closed due to inactivity.