oxisto / owl2proto

Apache License 2.0
3 stars 0 forks source link

Generate repeated string for resource type #13

Closed anatheka closed 6 months ago

anatheka commented 6 months ago

I think we want to define the resource type string arrays ("VirtualMachine", "Compute", "CloudResource", "Resource") in the proto files, too. As far as I know we have only the possibility to use enums. And as far as I can see, we have 2 options to use the enum in the resource message.

  1. We use directly the enum ResourceType as follows:
    
    message Resource {
    ...
    // The resource type. It is an array, because a type can be derived from another
    ResourceType resource_type = 7;
    ...
    }

enum ResourceType { UNSPECIFIED = 0; RESOURCE = 1; CLOUD_RESOURCE_RESOURCE = 2; COMPUTE_CLOUD_RESOURCE_RESOURCE = 3; VIRTUAL_MACHINE_COMPUTE_CLOUD_RESOURCE_RESOURCE = 4; ... }

2. We use EnumValueOptions to be able to continue using the string arrays.  

extend google.protobuf.EnumValueOptions { optional string string_name = 123456789; } message Resource { ... // The resource type. It is an array, because a type can be derived from another repeated string resource_type = 7;

... }

enum ResourceType { UNSPECIFIED = 0; RESOURCE = 1 [(string_name) = "Resource"]; CLOUD_RESOURCE_RESOURCE = 2 [(string_name) = "CloudResource, Resource"]; COMPUTE_CLOUD_RESOURCE_RESOURCE = 3 [(string_name) = "Compute, CloudResource, Resource"]; VIRTUAL_MACHINE_COMPUTE_CLOUD_RESOURCE_RESOURCE = 4 [(string_name) = "VirtualMachine, Compute, CloudResource, Resource"]; ... }


I would prefer the 2nd one, then the resource type remains readable and you can transfer it directly into the UI without converting the enums back into strings. This might also make it easier for the partners. But they first have to find out how to read the enum string_names, which is unfortunately not entirely trivial depending on the programming language. 
@oxisto what do you think?
oxisto commented 6 months ago

I wonder if we still need this array at all. Where exactly was it used? Can't we get this information some other way? I think it was always a major hack when we originally added it.

anatheka commented 6 months ago

As far as I know, the resource types are used in the rego stuff and furthermore I started to use it in the metrics. For example, until now it was not possible to know if the field "enabled" in accessRestriction is the informaiton for the WebApplicationFirewall or the L3Firewall. Maybe we are now able to see to which kind of firewall the field enabled belongs to, but I'ḿ not sure.

JSON resource info:

...
  "properties": {
        "accessRestriction": {
          "enabled": true,
          "inbound": false,
          "restrictedPorts": ""
        },
...

Metric for L3FirewallEnabled

enabled := input.accessRestriction.enabled

applicable {
    enabled != null
    compare("isIn",  "NetworkInterface", input.type)
}

compliant {
    compare(data.operator, data.target_value, enabled)
}
anatheka commented 6 months ago

I checked it for the new ontology proto file and now the "firewall type" information is included

"securityFeature": [
            {
              "authorization": {
                "accessRestriction": {
                  "firewall": {
                    "l3Firewall": {
                      "enabled": true
                    }}}}}]

If you don't need it anymore for the policy/rego stuff we can skip it.

oxisto commented 6 months ago

I checked it for the new ontology proto file and now the "firewall type" information is included

"securityFeature": [
            {
              "authorization": {
                "accessRestriction": {
                  "firewall": {
                    "l3Firewall": {
                      "enabled": true
                    }}}}}]

If you don't need it anymore for the policy/rego stuff we can skip it.

I think we don’t need it for the Rego policy anymore. It could be that it is used for policy caching. I need to have a look at this.

oxisto commented 6 months ago

Unfortunately, we use the array of resource types to derive a cache key to see whether a specified metric can be used for a specific resource type to avoid re-checking this for every evidence. This is needed because otherwise we would need to recompile the Rego query for each evidence, which is slow :(

https://github.com/clouditor/clouditor/blob/253ac1deee4e483dacf24b6398ce4ad8c2f8e0c1/policies/rego.go#L102-L107

anatheka commented 6 months ago

Ok, then the question is whether we want to continue using the string array or directly the enum. I would prefer the enum, but I'm open for both.

oxisto commented 6 months ago

In the end I only need a discriminator within the resource to somehow specify the type, whether it’s an enum or string should not matter.