project-oak / oak

Meaningful control of data in distributed systems.
Apache License 2.0
1.32k stars 114 forks source link

Simple policy for Oak Functions #1987

Closed rbehjati closed 3 years ago

rbehjati commented 3 years ago

Implement support for policies that avoid side-channel leaks from the response. The policies can be specified using protobuf as suggested by @tiziano88:

message Policy {
  // A fixed size for responses returned by the trusted runtime. 
  // If the response computed by the feature code is smaller than this amount, 
  // it is padded with additional data before encryption in order to make the
  // payload size exactly this size. If the response is larger than this amount, 
  // the trusted runtime discards the response and instead sends a message of  
  // exactly this size to the client, containing an error code indicating the 
  // failure.
  int32 constant_response_size_bytes = 1;

  // Similar to the previous one, but controls the amount of time the function is 
  // allowed to run for. If the function finishes before this time, the response is 
  // not sent back until the time is elapsed. If the function does not finish within
  // this deadline, the trusted runtime sends a message to the client containing 
  // an error code indicating the failure, of the size specified by the previous 
  // parameter.
  int32 constant_process_time_millis = 2;
}

More properties, for example related to differential privacy, can be added later.

For Oak Functions v0, we'll implement this as part of the server configuration. But in future it will be changed to a client-provided policy.

rbehjati commented 3 years ago

I wonder what should be returned as the status code in case a valid response conforming to the policy is not available. Can we define our own HTTP status code? Also, do we need different status codes to distinguish between the case where we received a response larger than constant_response_size_bytes and the case where a response was not available within the expected processing time?

WDYT @tiziano88, @conradgrobler?

tiziano88 commented 3 years ago

I would not suggest creating custom HTTP codes.

In fact perhaps there are some that could fit our use case already:

Alternatively we could just use a generic https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400 and put the actual reason in the body of the response (we would need to take into account the size of this message of course to make sure it does not cause a policy violation again).

rbehjati commented 3 years ago

I think 413 and 408 are not applicable, as they indicate issues about the request. Perhaps 400 is the most applicable one. I agree with including the reason in the response.

conradgrobler commented 3 years ago

I think the 500 range is more appropriate. The 400 range are typically for errors with the client request. 400 usually indicates a request that is malformed. The 500 range of codes are for server-related errors, so 500 is probably a good generic error code to use.

tiziano88 commented 3 years ago

Good point, I think it also depends on how we consider these errors. Usually 500 errors are considered actionable issue that need to be solved on the server. In our case, if a policy is violated, do we think it is because there is something to fix on the server, or because the request is somehow incorrect / inappropriate?

conradgrobler commented 3 years ago

I think a response that is too big and a timeout both indicate server issues. A malformed request should not be able to cause either if that lookup data is ok.

rbehjati commented 3 years ago

I agree with 500 for this version where the policy is part of the server configuration. But for later versions when we move the policy to the request, 400 might be better. Perhaps, in that case we should still require the server configuration to specify a lower bound for the response size and response time to be able to distinguish between server errors and bad policy in the request.

rbehjati commented 3 years ago

As discussed, I will only use the headers and the body of the response to calculate its size. But I think in that case an attacker can easily distinguish between OK, BAD_REQUEST, and INTERNAL_SERVER_ERROR responses.

tiziano88 commented 3 years ago

How could a Wasm invocation return OK vs BAD_REQUEST vs INTERNAL_SERVER_ERROR? Or do you mean in case there is an error even before (or after) invoking the Wasm function? If the latter, then I think we do need to be careful not to expose different errors for different requests, as long as the requests are well formed. Or maybe we should have a top-level wrapper for a generic request handler that returns a specific result type, and then we unwrap and handle the error values and ensure that every possible outcome gets mapped to a fixed-size response with the appropriate error value in the body. That, combined with the fact that our code should never panic when handling user requests, should hopefully ensure that all responses look identical to an attacker. Also it may be a nice invariant to verify via fuzzing (#1992)

rbehjati commented 3 years ago

How could a Wasm invocation return OK vs BAD_REQUEST vs INTERNAL_SERVER_ERROR? Or do you mean in case there is an error even before (or after) invoking the Wasm function?

The latter. For instance, we return INTERNAL_SERVER_ERROR if the Wasm module cannot produce a response that complies with the policy.

Or maybe we should have a top-level wrapper for a generic request handler that returns a specific result type, and then we unwrap and handle the error values and ensure that every possible outcome gets mapped to a fixed-size response with the appropriate error value in the body.

Are you suggesting that we always respond with OK, no headers in the response, and a richer body that, in addition to the actual response bytes, encapsulates the actual status of the operation, plus any necessary padding?

rbehjati commented 3 years ago

How do these policies work with streaming and remote attestation? Should the policies still hold for every response?

conradgrobler commented 3 years ago

How do these policies work with streaming and remote attestation? Should the policies still hold for every response?

I think it should only apply to responses that are influenced by the Wasm module. The initial handshake and attestation should not depend in any way on the data that will be sent later and is controlled completely by the trusted runtime, so shouldn't pose a privacy risk.

rbehjati commented 3 years ago

Makes sense. Thanks.