riscv-non-isa / riscv-ap-tee

This repo holds the work area and revisions of the non-ISA specification created by the RISC-V AP-TEE TG. This specification defines the programming interfaces (ABI) to support the Confidential VM Extension (CoVE) confidential computing architecture for RISC-V application-processor platforms.
https://jira.riscv.org/browse/RVG-76
Creative Commons Attribution 4.0 International
49 stars 19 forks source link

Broaden the specification's scope to embedded systems with simpler hardware and reduced functionalities #75

Closed wojciechozga closed 4 months ago

wojciechozga commented 5 months ago

Guerney and I introduce in this PR changes to the CoVE spec that make the spec suitable for broader number of use cases, like embedded systems, simpler processors, systems designed under the formal verification regime, such as assured confidential execution (ACE) that we develop as open source project: https://github.com/IBM/ACE-RISCV

atishp04 commented 5 months ago

Few broader questions:

  1. As per the discussion in the RVI meeting: Keeping the TSM in M-mode helps in formal verification. As per my understanding it requires OpenSBI to be formally verified as well. If we run OpenSBI in M-mode and TSM in HS mode, the formal verification effort will be the same. Isn't it ?

Do we require more involved formal verification because of privilege mode changes ?

If deployment model 3 is aimed for embedded use case, TSM can just run HS mode without any M-mode involvement. This also reduces the TCB to be verified.

Note: I don't have formal verification background. Thus, I may be asking stupid questions :)

wojciechozga commented 5 months ago

Few broader questions:

  1. As per the discussion in the RVI meeting: Keeping the TSM in M-mode helps in formal verification. As per my understanding it requires OpenSBI to be formally verified as well. If we run OpenSBI in M-mode and TSM in HS mode, the formal verification effort will be the same. Isn't it ?

Do we require more involved formal verification because of privilege mode changes ?

If deployment model 3 is aimed for embedded use case, TSM can just run HS mode without any M-mode involvement. This also reduces the TCB to be verified.

Note: I don't have formal verification background. Thus, I may be asking stupid questions :)

Keeping TSM in M-mode helps with formal verification because: (1) it simplifies reasoning because we have a clear split of security domains among processor's privileged modes, (2) there are no context switches/switches between the M-mode and HS-mode that will be non-trivial to reason about, because one would have to reason about the ISA machine state for that transition, (3) it does not require to reason about resource ownership split between TSM running in HS-mode and a more privileged M-mode software (4) it reduces the number of TSM’s source line of code (SLoC) written in assembly; assembly is more difficult for formal reasoning.

You might run TSM in HS-mode in embedded scenario and try to formally verify some of its property. The deployment model 3 in Appendix D shows a variant of CoVE where TSM runs in M-mode because it is more practical for formally reasoning, as stated above.

The decision whether to verify OpenSBI or not depends on the goals of using formal methods for CoVE. These can vary. For example, for strong security claims you might have to prove all M-mode software but probably still have to axiomatize the hardware implementation. On the other hand, one might try to use formal methods for verifying functional correctness of specific parts of the implementation, so verifying only a subset of TSM functionality would be ok. Different certification levels (FIPS, Common Criteria) might have different requirements for what is expected to be proven.

gdhh commented 5 months ago

Few broader questions:

  1. As per the discussion in the RVI meeting: Keeping the TSM in M-mode helps in formal verification. As per my understanding it requires OpenSBI to be formally verified as well. If we run OpenSBI in M-mode and TSM in HS mode, the formal verification effort will be the same. Isn't it ?

Do we require more involved formal verification because of privilege mode changes ?

If deployment model 3 is aimed for embedded use case, TSM can just run HS mode without any M-mode involvement. This also reduces the TCB to be verified.

Note: I don't have formal verification background. Thus, I may be asking stupid questions :)

Wojciech also replied to this. Fundamentally it appears that are proposing another deployment model. On a processor with hypervisor mode but without smmtt there are two approaches to running the TSM in HS mode. A) running the TSM beside the hypervisor (sharing HS with the hypervisor), B) it replaces the hypervisor, or C) deprivilege the hypervisor. A or B imply mean more code in the TSM and therefore more complexity to verify. C is a simplification of B. By moving the Hypervisor to S mode (a form of deprivilege). This would provide hardware separation between the TSM and the hypervisor however, all request from all VMs would have to be routed via the TSM. Whereas in the current approach only request from confidential VM are routed through the TSM. (supporting confidential computation has minimal impact on normal VMs.) If this is a correct understanding of your proposal I agree that if we are successful in proving the TSM in our proposed model that proof could be leveraged for the model you have proposed. But I do not think that the current work we are doing would be sufficient to prove the new model. I believe that there would be more work, even though OpenSBI is running separately in M mode.

atishp04 commented 5 months ago

Few broader questions:

  1. As per the discussion in the RVI meeting: Keeping the TSM in M-mode helps in formal verification. As per my understanding it requires OpenSBI to be formally verified as well. If we run OpenSBI in M-mode and TSM in HS mode, the formal verification effort will be the same. Isn't it ?

Do we require more involved formal verification because of privilege mode changes ? If deployment model 3 is aimed for embedded use case, TSM can just run HS mode without any M-mode involvement. This also reduces the TCB to be verified. Note: I don't have formal verification background. Thus, I may be asking stupid questions :)

Wojciech also replied to this. Fundamentally it appears that are proposing another deployment model. On a processor with hypervisor mode but without smmtt there are two approaches to running the TSM in HS mode. A) running the TSM beside the hypervisor (sharing HS with the hypervisor), B) it replaces the hypervisor, or C) deprivilege the hypervisor. A or B imply mean more code in the TSM and therefore more complexity to verify. C is a simplification of B. By moving the Hypervisor to S mode (a form of deprivilege). This would provide hardware separation between the TSM and the hypervisor however, all request from all VMs would have to be routed via the TSM. Whereas in the current approach only

Not necessarily if the non-confidential VMs are run as a nested VM. But TSM does have to some additional H-mode state to enable that.

request from confidential VM are routed through the TSM. (supporting confidential computation has minimal impact on normal VMs.) If this is a correct understanding of your proposal I agree that if we are successful in proving the TSM in our proposed model that proof could be leveraged for the model you have proposed. But I do not think that the current work we are doing would be sufficient to prove the new model. I believe that there would be more work, even though OpenSBI is running separately in M mode.

Agreed. I was just trying to understand the challenges of formal verification between running different models. It seems it boils down to more complex TSM code/additional privilege state management vs more code in M-mode.

As per my understanding from your and @wojciechozga's response, the later is easier. But if we can axiomatize hardware implementation and verifying only a subset of TSM functionality is okay, it shouldn't matter. Isn't it ?

gdhh commented 5 months ago

Not necessarily if the non-confidential VMs are run as a nested VM. But TSM does have to some additional H-mode state to enable that.

request from confidential VM are routed through the TSM. (supporting confidential computation has minimal impact on normal VMs.) If this is a correct understanding of your proposal I agree that if we are successful in proving the TSM in our proposed model that proof could be leveraged for the model you have proposed. But I do not think that the current work we are doing would be sufficient to prove the new model. I believe that there would be more work, even though OpenSBI is running separately in M mode.

Agreed. I was just trying to understand the challenges of formal verification between running different models. It seems it boils down to more complex TSM code/additional privilege state management vs more code in M-mode.

I think this is a reasonable summary.

As per my understanding from your and @wojciechozga's response, the later is easier. But if we can axiomatize hardware implementation and verifying only a subset of TSM functionality is okay, it shouldn't matter. Isn't it ?

I believe that by verifying only a subset of TSM functionally you mean that only the functionality that affects TVM. I do not think that this is a good approach. Because the VMs and TVMs would be sharing the same TSM the side channel issue discussed below has to be considered.

If by nested VMs you mean running a second level hypervisor that services all non confidential VMs. That could possibly work (assuming the correct hardware support). However, it is more complex than what we have proposed. The other issue is that unless we change the hardware implementation, the second level hypervisor would possibly have to get services from the TSM.

In addition to proof complexity, once the TSM is servicing anyone other than TVMs we open up the possibility of side channels. In particular we would have to show that there is no way to leak information between the TVS and the VMs (or second level hypervisor). Essentially, you need to show that nothing the VMs do will influence the TVMs and that none of the VM request will be influenced by the TVMs. I believe that it is possible to build a system with this approach where this is true. But it puts an additional architecture and proof burden on the system that the approach we have proposed does not have. The TSM only services one call from the VMs. That call results in the VM either being terminated because it fails attestation or becoming a TVM. Properly chosen cryptographic algorithms and key length (and key material) means that try fail semantics to break into the system would take too long to be practical.

Our current proof approach is not without side channel issues. We have to show that TVMs do not leak information to one another because they share the same TSM. Our approach to this might also work for an approach where the TSM services both, but that is a complexity we are avoiding. My statement on the required additional proof work is only an observation of the amount of work to be done and not a judgement as to whether it can be done. We are still in the process of proving the current approach so we do not know whether we will be successful, but we suspect so. We want to add some additional capabilities but not until we have a proven base.

gdhh commented 4 months ago

I reviewed the changes an made a few minor corrections for clarity.