riscv / riscv-smmtt

This specification will define the RISC-V privilege ISA extensions required to support Supervisor Domain isolation for multi-tenant security use cases e.g. confidential-computing, trusted platform services, fault isolation and so on.
https://jira.riscv.org/browse/RVG-65
Creative Commons Attribution 4.0 International
43 stars 17 forks source link

Consider add a separate control bit in msdcfg for external trace #21

Closed rsahita closed 8 months ago

rsahita commented 10 months ago

Alternately could be controlled via the msdcfg.SDEDBGALW for both ext debug and trace

ved-rivos commented 10 months ago

Using msdcfg.SDEDBGEN seems appropriate. There isn't value to control each individually as using either violates confidentiality. The specification should require the halted trace encoder side band input signal to be implemented and this control should be used to drive the halted signal allowing encoder to generate packets to indicate that tracing has stopped and when tracing restarts to generate the appropriate synchronization packets.

rsahita commented 10 months ago

@gokhankaplayan suggested to keep the controls separate

gokhankaplayan commented 10 months ago

We had similar discussion in external debug security TG. The consensus also was to use halted sideband signal for it (https://github.com/joxie/riscv-debug-security/issues/11)

ARM has separate control for trace as non-intrusive debug enable. I assume there are use cases to have separate them. I do not have strong opinion about the use cases. Both external debug and trace violates confidentiality, but external debug has additional intrusive capabilities compared to trace. If we decide to have separate control for trace, I would propose to have both metrcen and msdcfg.sdetrcalw.

AoteJin commented 10 months ago

AFAIK, intel and ARM deals with these scope separately to some extent. I think It is a good practice to limit the capability if the user only requires trace.

rsahita commented 9 months ago

I don't see a use case to motivate adding a separate bit right now - we can always add in the future. I will add a descriptive text that the current bit controls both allowance of external debug and external trace.

AoteJin commented 9 months ago

Actually there are use cases for trace only scenario:

  1. crash dumping (probably in conjunction with self-hosted debug)
  2. Code profiling
  3. code coverage collection In those scenarios, it will be overkill to expose external debug access where only trace is desired
ved-rivos commented 9 months ago

All of these conditions lead to side channels and confidentiality break of the supervisor domain. Crash dumps will expose the memory and register state of the domain. Code profiling can be used to expose the value of the inputs being processed by the supervisor domain. Data trace can be used to expose the secrets being operated on by the supervisor domain. There is no use compelling use case for external debuggers to enable trace for a non debugable supervisor domain. If a case can be made why exposing secrets through trace to an external debugger is made in future then such a control may be introduced.

rsahita commented 9 months ago

@AoteJin all the scenarios you describe above are equally intrusive (as ext debug) on the confidentiality of the SD, so I think we err towards security and take the simpler approach of keeping one control to avoid causing confusion.

AoteJin commented 9 months ago

ARM defines non invasive as “In contrast to invasive debug, non-invasive debug does not change the processor state in any way. For example, generating and collecting trace data from a target will usually not affect the processor, so trace is usually classified as non-invasive debug.”

It is not about external debug enabling trace for another SD to snoop information etc. The actual use case is that the SD owner may leverage the trace for debug or profile. It dosnt necessarily lead to info leakage. For example, the SD enable the trace inside the SD for an application. Before exiting SD, the trace output could be encrypted for offline analysis. The point is trace usage is non-invasive and can be decoupled from external debug.

ved-rivos commented 9 months ago

I think we need to separate non-invasive from benign. Trace is non-invasive but it is not benign i.e. it compromises the confidentiality of the entity under debug. That is also why it is a "debug" - even if it is non-invasive.

For the SD programming trace to RAM where the RAM is locate in the SDs confidential memory, the RDSM provides the SBI to program the trace controls and the RDSM will context switch the trace controls when switching SDs. This self hosted debug does not require any controls in msdcfg and is similar to other self hosted debug such as debug triggers and performance monitoring counters.

For the purposes of external debugging - what is being discussed in this issue - the authorization for debug should include all kinds of debug - invasive or non-invasive. There is no use compelling use case for external debuggers to enable trace for a supervisor domain that is otherwise not under debug. If a case can be made why exposing secrets through trace to an external debugger is made in future then such a control may be introduced.

AoteJin commented 9 months ago

For the purposes of external debugging - what is being discussed in this issue - the authorization for debug should include all kinds of debug - invasive or non-invasive. There is no use compelling use case for external debuggers to enable trace for a supervisor domain that is otherwise not under debug. If a case can be made why exposing secrets through trace to an external debugger is made in future then such a control may be introduced.

No, I replied in previous post that it is not about external debugger enabling trace. There are two actors could enable trace, external debugger and the hart. This knob mixed the trace control for both. The external debugger has no use case to enable another SD. But the SD itself has the use case to enable trace by hart when it is not debuggable.

ved-rivos commented 9 months ago

For self-hosted trace it was unclear why this needs a control bit in msdcfg. The trace controls of one supervisor domain will be distinct from another supervisor domain. As part of the world switch of the supervisor domains, the RDSM will need to restore the controls associated with the incoming SD. This is similar to other self-hosted debug capabilities such as performance monitoring counters and debug triggers. Whether a SD is allowed to use self-hosted debug capabilities should be part of the policy of the SD, a software construct, and such policy enforcement is done by the RDSM. However, presently the RISC-V architecture does not have a good way to do self-hosted trace - this is topic of discussion in the next DTPM SIG.

AoteJin commented 9 months ago

In RISC-V ecosystem, currently there is no so-called self-hosted trace and the trace is simply a peripheral that can be programmed both by hart and debugger. I think we are on the same page that RDSM should context switch trace control. But it just breaks the scheme when there is only a single knob shared by external debugger and trace, since different partitions can have different policies for external debugger and trace.

ved-rivos commented 9 months ago

In RISC-V ecosystem, currently there is no so-called self-hosted trace and the trace is simply a peripheral that can be programmed both by hart and debugger.

This is the issue I was trying to call out. Consider a system like below: image

To keep the external debugger outside the TCB of the SD, the RDSM will need to prevent external debugger access to the memory mapped registers of the a) trace encoder in the hart b) trace funnel c) trace RAM sink. Otherwise the RDSM may have programmed the trace from a hart to private memory of a non debug SD but the external debugger can reprogram the RAM sink control registers to change the location where the trace is dumped.

With SD there can be only one master of the trace infrastructure - the RDSM or the external debugger. If the RDSM is the master then there is no need for a control, the RDSM can context switch the trace controls. But this may not be always possible - e.g. in systems built with trace funnels. If the external debugger is the master then we cannot provide self hosted trace which is kept confidential from the external debugger - again no need for a separate control as it does not help.

Hence the need for good support of self hosted trace. We may want self hosted trace to look like below: image

This allows tracing virtual addresses. Each hart would have a hart specific trace buffer and the trace output can be contained to memory of the SD or even to the memory of a virtual machine in the SD. WIth this in place, the RDSM can control the mux shown that sends trace through memory pipe of the hart - under control of the hart MMU - or sends it to off-hart sinks. The off hart sink can continue to be undercontrol of external debugger - while the self hosted trace can be constrained by the RDSM. This does not need a control in msdcfg either.

rsahita commented 9 months ago

is this form of self-hosted trace in the charter of the trace WG?

AoteJin commented 9 months ago

@ved-rivos I think you elaborated it on two perspectives, the hart and the external debugger.

To keep the external debugger outside the TCB of the SD, the RDSM will need to prevent external debugger access to the memory mapped registers of the a) trace encoder in the hart b) trace funnel c) trace RAM sink. Otherwise the RDSM may have programmed the trace from a hart to private memory of a non debug SD but the external debugger can reprogram the RAM sink control registers to change the location where the trace is dumped.

It is true that if external debug is allowed for a SD, the RDSM could leverage IOPMP, MMU, PMP etc. to block external debugger access to trace for another SD. However it relies on IOPMP, MMU, PMP etc. those protections and there are design without them or with very limited resources of them. The protection scheme you elaborated is valid but specific with certain HW designs. It is not generally true.

If the RDSM is the master then there is no need for a control, the RDSM can context switch the trace controls.

From the hart perspective, RDSM context switching the trace control takes effect only when msdcfg.SDEDBGEN is true , otherwise it is meaningless. However, the use cases require that RDSM could enable trace for the SD which is not enabled for external debug. Thus, your statement stays true if trace control is not subject to msdcfg.SDEDBGEN.

Based on your post I think it implies that we are aligned at least on one point that external debug control should not affect trace control. Is it right?

ved-rivos commented 9 months ago

It is true that if external debug is allowed for a SD, the RDSM could leverage IOPMP, MMU, PMP etc. to block external debugger access to trace for another SD. However it relies on IOPMP, MMU, PMP etc. those protections and there are design without them or with very limited resources of them. The protection scheme you elaborated is valid but specific with certain HW designs. It is not generally true.

Consider an SoC that runs two SDs - one with external debug allowed and one without external debug allowed. Using IOPMP/MMU/PMP to prevent the external debugger from accessing trace controls is not a good idea as IOPMP are global and not per-hart. MMU and PMP cannot be used to control debug module access to the trace controls. Further in the example I showed where trace is funneled through a funnel to a sink, the funnel caters to two harts and one may be running a debug allowed and one a debug disabled SD.

Based on your post I think it implies that we are aligned at least on one point that external debug control should not affect trace control. Is it right?

No. We do not have a self hosted trace architecture. The trace architecture is heavily oriented to use with external debuggers. We should continue to use the external debugger based trace while self hosted trace architecture is developed. The single sdebgalw will be used to control whether external debuggers can trace SDs. When we have a self hosted trace architecture it will not need such a control - it will be more like other self hosted debug capabilities such as a debug triggers and HPMs.

From the hart perspective, RDSM context switching the trace control takes effect only when msdcfg.SDEDBGEN is true , otherwise it is meaningless. However, the use cases require that RDSM could enable trace for the SD which is not enabled for external debug. Thus, your statement stays true if trace control is not subject to msdcfg.SDEDBGEN.

SDEDBGALW is for external debugging - for self hosted debugging the RDSM will use its policies as expressed in manifest of the SD to determine whether self hosted trace should be delegated to a SD or not. When delegated to a SD it will be context switched like all other self hosted debug facilities.

AoteJin commented 9 months ago

The trace architecture is heavily oriented to use with external debuggers. We should continue to use the external debugger based trace while self hosted trace architecture is developed.

I don't think it is a valid hypothesis. The Efficient Trace for RISCV-V has the statement: "It is not always possible to use a debugger to observe behavior of a running system as this is intrusive. Providing visibility of program execution is important. This needs to be done without swamping the system with vast amounts of data. One method of achieving this is via a Processor Branch Trace"

Moreover, in ARM architecture, the trace modules are enabled by the processor itself while the external tracer is responsible to capture the trace output.

Both of them suggest external debugger is not the main actor to configure trace function. Besides, there are a bunch of existing use case that should not include external debug capability, for example, branch coverage, statement coverage, performance profiling etc.

When we have a self hosted trace architecture it will not need such a control - it will be more like other self hosted debug capabilities such as a debug triggers and HPMs.

The so-called self-hosted trace is not analog to self-hosted debug. Could you share any of existing example of the self-hosted trace in other architecture?

ved-rivos commented 9 months ago

The Efficient Trace for RISCV-V has the statement:

The eTrace specification is indicating that to obtain visibility into program execution one could single step that program but that is too intrusive. The alternate is to use Branch Trace - in case of branch trace the processor emits taken/not conditional branch history and pc delta for other types of branches. Given the start pc this allows reconstructing program execution using the much lower amount of data provided by eTrace and without affecting the behavior of the DUT itself. This is motivation for using eTrace vs. singlestep.

The so-called self-hosted trace is not analog to self-hosted debug. Could you share any of existing example of the self-hosted trace in other architecture?

Sure. Please see:

  1. ARM: FEAT_TRBE and FEAT_TRF
  2. x86: Intel Processor Trace
AoteJin commented 9 months ago

Thanks for sharing the material! It is insightful.

Let's firstly address the following questions to be on the same page:

Q1: How does ARM enable trace? A1: ARM could enable trace by external debugger via Coresight or internally through processor. It hold true both for architecture with or without FEAT_TRF.

Q2: Why does ARM introduce FEAT_TRF(Self-hosted Trace Extensions) A2: The FEAT_TRF is to enable the in-processor trace analysis, whichs provide additional privilege base control over trace to start/stop trace output and a barrier instruction to synchronize trace output. It enables OS to get trace of an application

trace control hierarchy drawio

It turns out there are 3 ways to use trace:

  1. The trace could be enable external debugger and the trace is consumed by debugger.
  2. The SW in processor enables the trace and the output is analyzed by offline host.
  3. The OS or trace tool in processor enables trace for an application with self-hosted, and the trace is consumed in processor.

If there are only 1 and 3, then I agree that the trace control could be coupled with external debug control since self-hosted trace is not developed for RISC-V and it could be designated to have separate access control for self-hosted trace.

However, the fact is that the option 2 is common for bare-metal use case where in-processor trace analysis is impossible. The use cases I mentioned in previous post are widely desired by such users and it will be overkill to expose external debug capability for those scenario.

Besides, there are a bunch of existing use case that should not include external debug capability, for example, branch coverage, statement coverage, performance profiling etc.

If the trace control is coupled with external debug control, the option 2 is unexpected forbidden when external debug is disallowed.

That's why I suggest to honor those real use cases and add separate control for trace.

rsahita commented 9 months ago

@AoteJin both options 2 and 3 are capabilities enabled by the processor (per your diagram above) - I equate those to self-hosted trace - whether the trace is analyzed offline or online is orthogonal IMO. The controls are owned by the SD software and may be context switched by the RDSM safely. Hence the only case we need to enforce via msdcfg is the case 1 - trace enabled by external debugger - so IMO it seems to be the common control suffices.

AoteJin commented 9 months ago

My point is that we should be practical about the use cases:

Besides, there are a bunch of existing use case that should not include external debug capability, for example, branch coverage, statement coverage, performance profiling etc.

If we propose an architecture in costs of scarifying existing use cases, it will be less welcomed.

Before ARM introduced the FEAT_TRF, those use cases could be satisfied by separate invasive/non-invasive knobs ( DBGEN, NIDEN, SPIDEN, and SPNIDEN). And the FEAT_TRF further enriches the tracing feature to enable an OS to analyze the trace of an application in the same processor.

Adding a trace control knob to enable current use cases and letting the self-hosted trace to extend the trace feature set is more realistic than discarding existing use cases till RISC-V ecosystem finally develop the self-hosted trace architecture.

rsahita commented 9 months ago

@AoteJin - Self-hosted trace or self-hosted debug is not of concern here at all and is explicitly orthogonal as we discussed above. We primarily care about external debug and trace. we are not sacrificing the use case at all - in fact- we are suggesting that both external debug and trace for a supervisor domain is signaled to be allowed via the same control bit. This is the same as you describe above:

The trace could be enable external debugger and the trace is consumed by debugger.

Whenever self-hosted trace is supported, it will work normally within the supervisor domain (if a supervisor domain chooses to enable it), and will be context switch managed by the RDSM.

ved-rivos commented 9 months ago

Further, for branch coverage and related profiling there is also CTR which is self hosted.

joxie commented 9 months ago

Hi @rsahita , @ved-rivos , sorry I wasn't aware of this issue until recently, actually there're use cases in NVIDIA that we need to profile the performance of the supervisor partition or do simple debug via examining the trace on a board where external debug is disabled, thus, we would need to enable the trace (with system level authentication mechanism to ensure that only NVIDIA engineer can access the trace) when external debug is disabled either via m-mode monitor (Internal Silicon bring up debug/profile) or via life-cycle fuse (Customer debug support), the current proposal does not meet the use case requirement. We discussed this in yesterday's external debug security TG meeting, multiple parties (MIPS, SiFive, NXP) acknowledged that this is a common use case in real practice, so we do have see use cases that requires separate control of debug and trace.

rsahita commented 8 months ago

@joxie please see PR #35 if it looks ok we can close this issue and merge that PR.

rsahita commented 8 months ago

closing this issue since a separate control bit has been proposed per PR #35