lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.51k stars 745 forks source link

OpenOCD hits timeout when setting pmpcfg3 register #14978

Closed dmcardle closed 1 year ago

dmcardle commented 2 years ago

For #14343, we'd like to be able to configure the PMP via GDB. At a high level, GDB talks to OpenOCD, which talks to the debug module via JTAG.

For some reason, Attempting to set pmpcfg[0-3] generates "waiting for busy to go low" warnings.

This is OpenOCD's debug output corresponding to reg pmpcfg3 0x9f000000, which attempts to set pmpcfg3.

Debug: 931 320 command.c:146 script_debug(): command - reg pmpcfg3 0x9f000000
Debug: 932 320 riscv.c:1921 riscv_openocd_poll(): polling all harts
Debug: 933 320 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 934 320 riscv.c:1873 riscv_poll_hart(): polling hart 0, target->state=2
Debug: 939 320 target.c:3054 handle_reg_command(): -
Debug: 940 320 riscv.c:3469 register_set(): [0]{0} write 0x9f000000 to pmpcfg3 (valid=0)
Debug: 941 320 riscv.c:3038 riscv_set_register_on_hart(): {0} csr931 <- 9f000000
Debug: 942 320 riscv-013.c:3641 riscv013_set_register(): [0] writing 0x9f000000 to register csr931 on hart 0
Debug: 943 320 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 944 320 riscv-013.c:1306 register_write_direct(): {0} csr931 <- 0x9f000000
Debug: 947 321 riscv-013.c:782 execute_abstract_command(): command=0x2303a3; access register, size=32, postexec=0, transfer=1, write=1, regno=0x3a3
Error: 18396 3239 riscv-013.c:767 wait_for_idle(): Timed out after 2s waiting for busy to go low (abstractcs=0x8001002). Increase the timeout with riscv set_command_timeout_sec.
Debug: 18397 3239 riscv-013.c:805 execute_abstract_command(): command 0x2303a3 failed; abstractcs=0x8001002
Debug: 18405 3240 riscv-013.c:782 execute_abstract_command(): command=0x221008; access register, size=32, postexec=0, transfer=1, write=0, regno=0x1008
Error: 36550 6243 riscv-013.c:763 wait_for_idle(): Abstract command ended in error 'busy' (abstractcs=0x8001102)
Error: 36551 6243 riscv-013.c:767 wait_for_idle(): Timed out after 2s waiting for busy to go low (abstractcs=0x8001102). Increase the timeout with riscv set_command_timeout_sec.
Debug: 36552 6243 riscv-013.c:805 execute_abstract_command(): command 0x221008 failed; abstractcs=0x8001102
Debug: 36557 6244 riscv.c:3054 riscv_set_register_on_hart(): [riscv.tap.0]{0} wrote 0x9f000000 to pmpcfg3 valid=0
User : 36558 6244 gdb_server.c:723 gdb_output(): pmpcfg3 (/32): 0x9f000000
User : 36559 6244 gdb_server.c:723 gdb_output():

For contrast, this is the output from reg a0 0x1234.

Debug: 894 316 command.c:146 script_debug(): command - reg a0 0x1234
Debug: 895 316 riscv.c:1921 riscv_openocd_poll(): polling all harts
Debug: 896 316 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 897 316 riscv.c:1873 riscv_poll_hart(): polling hart 0, target->state=2
Debug: 902 317 target.c:3054 handle_reg_command(): -
Debug: 903 317 riscv.c:3469 register_set(): [0]{0} write 0x00001234 to a0 (valid=0)
Debug: 904 317 riscv.c:3038 riscv_set_register_on_hart(): {0} a0 <- 1234
Debug: 905 317 riscv-013.c:3641 riscv013_set_register(): [0] writing 0x1234 to register a0 on hart 0
Debug: 906 317 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 907 317 riscv-013.c:1306 register_write_direct(): {0} a0 <- 0x1234
Debug: 910 317 riscv-013.c:782 execute_abstract_command(): command=0x23100a; access register, size=32, postexec=0, transfer=1, write=1, regno=0x100a
Debug: 915 318 riscv.c:3054 riscv_set_register_on_hart(): [riscv.tap.0]{0} wrote 0x1234 to a0 valid=1
User : 916 318 gdb_server.c:723 gdb_output(): a0 (/32): 0x00001234
User : 917 318 gdb_server.c:723 gdb_output():

Our debug module should support the "Access Register" abstract command (defined in section 3.6.1 of RISC-V External Debug Support^0. Here's a fully-exploded view of the the OpenOCD-generated abstract command 0x2303a3:

           cmdtype: 0x0 = 0b0 = 0 [8 bits]
                 0: 0x0 = 0b0 = 0 [1 bits]
           aarsize: 0x2 = 0b10 = 2 [3 bits]
  aarpostincrement: 0x0 = 0b0 = 0 [1 bits]
          postexec: 0x0 = 0b0 = 0 [1 bits]
          transfer: 0x1 = 0b1 = 1 [1 bits]
             write: 0x1 = 0b1 = 1 [1 bits]
             regno: 0x3a3 = 0b1110100011 = 931 [16 bits]

Perhaps these OpenOCD warnings are ignorable, but they may indicate a bug in our debug module implementation. If I do try to blunder on, I hit fatal errors later. This attempt to read ra happens when I issue load sram_program.elf in GDB.

Debug: 176973 29646 riscv-013.c:3615 riscv013_get_register(): [0] reading register ra on hart 0
Debug: 176974 29646 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 176975 29646 riscv-013.c:782 execute_abstract_command(): command=0x221001; access register, size=32, postexec=0, transfer=1, write=0, regno=0x1001
Error: 194892 32635 riscv-013.c:763 wait_for_idle(): Abstract command ended in error 'busy' (abstractcs=0x8001102)
Error: 194893 32635 riscv-013.c:767 wait_for_idle(): Timed out after 2s waiting for busy to go low (abstractcs=0x8001102). Increase the timeout with riscv set_command_timeout_sec.
Debug: 194894 32635 riscv-013.c:805 execute_abstract_command(): command 0x221001 failed; abstractcs=0x8001102
Debug: 194899 32635 riscv.c:3097 riscv_get_register_on_hart(): {0} ra: ffffffffffffffff
Debug: 194900 32635 gdb_server.c:1422 gdb_error(): Reporting -4 to GDB as generic error

Configuration:

openocd-filtered.log

tjaychen commented 2 years ago

huh... i have a dumb question..going to need to examine the code a bit.. but is register...931 actually correct?

dmcardle commented 2 years ago

huh... i have a dumb question..going to need to examine the code a bit.. but is register...931 actually correct?

Yeah, it's 0x3a3 (= 931) according to RISC-V Privileged Architectures V1.10.

tjaychen commented 2 years ago

sounds good, let me poke around.

tjaychen commented 2 years ago

does accesses to a1 work for you? it looks like the dm_mem special cases writes to a0, but most of the others should be the same.. actually i'm probably wondering more what registers do and don't work heh. wait..we might have enough infra where i can sim this. let me give that a try.

dmcardle commented 2 years ago

Yeah, setting a1 works fine.

Debug: 894 309 command.c:146 script_debug(): command - reg a1 0x1234
Debug: 895 309 riscv.c:1921 riscv_openocd_poll(): polling all harts
Debug: 896 309 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 897 309 riscv.c:1873 riscv_poll_hart(): polling hart 0, target->state=2
Debug: 902 309 target.c:3054 handle_reg_command(): -
Debug: 903 309 riscv.c:3469 register_set(): [0]{0} write 0x00001234 to a1 (valid=0)
Debug: 904 309 riscv.c:3038 riscv_set_register_on_hart(): {0} a1 <- 1234
Debug: 905 309 riscv-013.c:3641 riscv013_set_register(): [0] writing 0x1234 to register a1 on hart 0
Debug: 906 309 riscv.c:2916 riscv_set_current_hartid(): setting hartid to 0, was 0
Debug: 907 309 riscv-013.c:1306 register_write_direct(): {0} a1 <- 0x1234
Debug: 910 310 riscv-013.c:782 execute_abstract_command(): command=0x23100b; access register, size=32, postexec=0, transfer=1, write=1, regno=0x100b
Debug: 915 311 riscv.c:3054 riscv_set_register_on_hart(): [riscv.tap.0]{0} wrote 0x1234 to a1 valid=1
User : 916 311 gdb_server.c:723 gdb_output(): a1 (/32): 0x00001234
User : 917 311 gdb_server.c:723 gdb_output():

I've been cross-referencing the codegen in dm_mem.sv hoping to find a CSR-specific mistake, but no luck so far.

dmcardle commented 2 years ago

Bizarrely, I've found that setting pmpcfg0, pmpcfg1, and pmpcfg2 just works. I tried setting pmpcfg0 to both zero and non-zero values and it does not result in timeout.

[GDB] pmpcfg0 (/32): 0x9f000000
[GDB]
[GDB] pmpcfg0 (/32): 0x00000000
[GDB]
[GDB] pmpcfg1 (/32): 0x00000000
[GDB]
[GDB] pmpcfg2 (/32): 0x00000000
[GDB]
[GDB] Timed out after 2s waiting for busy to go low (abstractcs=0x8001002). Increase the timeout with riscv set_command_timeout_sec.
[GDB] Abstract command ended in error 'busy' (abstractcs=0x8001102)
[GDB] Timed out after 2s waiting for busy to go low (abstractcs=0x8001102). Increase the timeout with riscv set_command_timeout_sec.
[GDB] pmpcfg3 (/32): 0x00000000
[GDB]
[GDB] Abstract command ended in error 'busy' (abstractcs=0x8001102)
[GDB] Timed out after 2s waiting for busy to go low (abstractcs=0x8001102). Increase the timeout with riscv set_command_timeout_sec.
[GDB] Abstract command ended in error 'busy' (abstractcs=0x8001102)
[GDB] Timed out after 2s waiting for busy to go low (abstractcs=0x8001102). Increase the timeout with riscv set_command_timeout_sec.
[GDB] pmpcfg3 (/32): 0x9f000000
tjaychen commented 2 years ago

are we sure pmpcfg3 works from the perspective of ibex in general? btw, do you happen to have the full openocd log for pmpcfg0~2? Just wanted to see the full command.

tjaychen commented 2 years ago

alright! i think i've finally reproduced this in simulation! time for a deep dive! please hold :)

tjaychen commented 2 years ago

just to double check, i'm just kind of blindly copying here, but is the value 0x9f000000 what we want?

tjaychen commented 2 years ago

okay coming back to this, i think i might need some more details. I was able to reproduce it, but that was more of a fluke. What happened was that I programmed some random value into pmpcfg3, and that caused an instruction fetch error, which killed everything because the debug rom doesn't have the ability to handle exceptions very cleanly.

but when i switched it to your value, everything worked fine. However, in my test, I am ONLY doing this value and nothing else. Is there anything else in your setup before this that might change the context? For example was the address change?

tjaychen commented 2 years ago

ah i should also mention my sim is done using test_rom, which prior to this point, actually goes in and makes region 15 basically the entire address space and fully executable. So there's very little I could do to actually "break" that.

so using this as an example..it looks like we're just setting region 15, but are we also setting region 15 address prior? Because if we are not, it looks like 0x9f000000 would actually kill region 13, which enables the rv_rom access space. So that would render any future handling of abstract commands completely dead. Let me take out our test rom's "open all access", and just do the pmpcfg3 change, it might show similar results as what you're seeing.

If it's not clear, the way rv_dm handles abstract commands is essentially through the program buffer. So if ibex cannot fetch instructions from rv_dm, it will never be able to finish the transaction and that is probably why you see timeouts.

alphan commented 2 years ago

ah i should also mention my sim is done using test_rom, which prior to this point, actually goes in and makes region 15 basically the entire address space and fully executable. So there's very little I could do to actually "break" that.

so using this as an example..it looks like we're just setting region 15, but are we also setting region 15 address prior? Because if we are not, it looks like 0x9f000000 would actually kill region 13, which enables the rv_rom access space. So that would render any future handling of abstract commands completely dead. Let me take out our test rom's "open all access", and just do the pmpcfg3 change, it might show similar results as what you're seeing.

Yeah, I couldn't look into this in detail yet, but could be an ordering issue like @tjaychen suggests depending on when this command is executed. The sequence here should work -- but assuming that the ePMP config is at its reset value, i.e. we do this at the wfi in rom_start.S. IIRC, this should match what we do in test_rom_start.S. @dmcardle can you check the value of pc when you are attempting to configure the ePMP?

dmcardle commented 2 years ago

are we sure pmpcfg3 works from the perspective of ibex in general?

I think so, because we use it in test_rom_start.S.

btw, do you happen to have the full openocd log for pmpcfg0~2? Just wanted to see the full command.

I attached an OpenOCD log to the first comment, but I just realized it's nearly invisible due to formatting. Here's the link. https://github.com/lowRISC/opentitan/files/9585900/openocd-filtered.log

just to double check, i'm just kind of blindly copying here, but is the value 0x9f000000 what we want?

I really just stole it from test_rom_start.S :)

okay coming back to this, i think i might need some more details. I was able to reproduce it, but that was more of a fluke. What happened was that I programmed some random value into pmpcfg3, and that caused an instruction fetch error, which killed everything because the debug rom doesn't have the ability to handle exceptions very cleanly.

but when i switched it to your value, everything worked fine. However, in my test, I am ONLY doing this value and nothing else. Is there anything else in your setup before this that might change the context? For example was the address change?

Huh, I don't really follow how that caused an instruction fetch error — shouldn't execution be halted?

I'll try to describe the context in more detail. I'm running the production ROM, but with a slight modification where the first instruction of _rom_start_boot is wfi. This has the same effect as zeroing out CREATOR_SW_CFG_ROM_EXEC_EN. Without this change, it's difficult to halt the device for debugging because it bootloops when it cannot find ROM_EXT.

Once the bitstream is loaded on the device, I start OpenOCD and connect GDB. The GDB script tells OpenOCD to reset and halt the device with monitor reset halt.

At this point, pc (and dpc) are 0x00008080. According to //sw/device/silicon_creator/rom:rom_fpga_cw310_dis, that is the address of a jump into _rom_start_boot.

_rom_interrupt_vector_asm():
    8000:       3b80006f                j       83b8 <_asm_exception_handler>
    8004:       3b40006f                j       83b8 <_asm_exception_handler>
...
    807c:       33c0006f                j       83b8 <_asm_exception_handler>
    8080:       12c0006f                j       81ac <_rom_start_boot>

ah i should also mention my sim is done using test_rom, which prior to this point, actually goes in and makes region 15 basically the entire address space and fully executable. So there's very little I could do to actually "break" that.

so using this as an example..it looks like we're just setting region 15, but are we also setting region 15 address prior? Because if we are not, it looks like 0x9f000000 would actually kill region 13, which enables the rv_rom access space. So that would render any future handling of abstract commands completely dead. Let me take out our test rom's "open all access", and just do the pmpcfg3 change, it might show similar results as what you're seeing.

Well, we are setting region 15, but (in theory) the PMP should have been reset when we told OpenOCD to reset halt.

If I print out the pmpcfg registers after halt reset, but before trying to set them, this is what I see.

[GDB] pmpcfg0 (/32): 0x009d0000
[GDB]
[GDB] pmpcfg1 (/32): 0x00000000
[GDB]
[GDB] pmpcfg2 (/32): 0x8b000000
[GDB]
[GDB] pmpcfg3 (/32): 0x00009f00

If it's not clear, the way rv_dm handles abstract commands is essentially through the program buffer. So if ibex cannot fetch instructions from rv_dm, it will never be able to finish the transaction and that is probably why you see timeouts.

Yeah, it is neat how the debug module generates code for abstract commands! I figured that out while looking through the implementation of the Access Register abstract command in dm_mem.sv.

Wow, I think you may have nailed it. It sounds like you're saying that reconfiguring the PMP may disallow access to the program buffer, causing an instruction fetch exception midway through executing the abstract command that sets the pmpcfg register. Did I get that right?

dmcardle commented 2 years ago

I don't see anything in the RISC-V debug spec^0 mentioning this particular footgun. Would it be reasonable to disable the PMP when we're in Debug Mode?

dmcardle commented 2 years ago

I suppose as a workaround we should try reusing region 13 rather than obliterating it and trying to non-atomically set up region 15.

dmcardle commented 2 years ago

Success!

I was able to execute the example SRAM program by preserving the existing values of the pmpcfg registers, only bitwise-oring values in.

The default values of pmpcfg registers are:

[GDB] pmpcfg0 (/32): 0x009d0000
[GDB]
[GDB] pmpcfg1 (/32): 0x00000000
[GDB]
[GDB] pmpcfg2 (/32): 0x8b000000
[GDB]
[GDB] pmpcfg3 (/32): 0x00009f00
[GDB]
[GDB] pmpaddr0 (/32): 0x00000000

And I'm changing them to:

[GDB] pmpcfg0 (/32): 0x009d009f
[GDB]
[GDB] pmpcfg1 (/32): 0x00000000
[GDB]
[GDB] pmpcfg2 (/32): 0x8b000000
[GDB]
[GDB] pmpcfg3 (/32): 0x00009f00
[GDB]
[GDB] pmpaddr0 (/32): 0x7fffffff

The effect is that I'm setting pmp0cfg to 0x9f = 0b10011111 (L=1, A=3=NAPOT, X=1, W=1, R=1). This is the lowest-numbered entry, so it happily has the highest priority.

tjaychen commented 2 years ago

sorry for the late reply! time zone differences, but yep! and to your point, i think we should probably file an issue with the ibex folks to ask this point be clarified.

I'm not sure making abstract command handling through the program buffer is actually something that's generally intended. That might have been an implementation specific choice to serialize things all into one place.

Glad to hear the problem was resolved though! Could you or Alphan help leave a giant comment somewhere in the code to clarify this? I am positive you're not the first to see this, nor will you be the last.

tjaychen commented 2 years ago

it's also worth considering as you say, disabling pmp during debug mode. But the only reason why you "might" not want to do that by default is because then the normal execution behavior and the debug execution behavior will diverge. So if you are using debug mode to track down an error in software due to misconfigured pmps, it might be hard to do so.

I would definitely raise this in an issue with the ibex folks also for further discussion.

dmcardle commented 2 years ago

Could you or Alphan help leave a giant comment somewhere in the code to clarify this? I am positive you're not the first to see this, nor will you be the last.

Absolutely, I will make sure to explain this in depth near the part of the GDB script that configures the PMP. (Edit: Done in #15024.)

I'm not sure making abstract command handling through the program buffer is actually something that's generally intended. That might have been an implementation specific choice to serialize things all into one place.

Agreed, this feels like a leaky abstraction.

Thanks for all your help, @tjaychen!

GregAC commented 1 year ago

it's also worth considering as you say, disabling pmp during debug mode

It'd be simple enough to add in, though may not be strictly RISC-V spec compliant. The debug module has direct system bus access so any GDB/OpenOCD memory access would likely use that and bypass PMP anyway.

I'm not sure making abstract command handling through the program buffer is actually something that's generally intended. That might have been an implementation specific choice to serialize things all into one place.

It is implementation specific for the PULP RISC-V debug module but it one of the possible implementations listed in the debug spec (See A.2 in https://riscv.org/wp-content/uploads/2019/03/riscv-debug-release.pdf).

I think I'll email an appropriate RISC-V list about this and see what other people's thoughts are. PMP doesn't apply to debug mode execution does sound pretty reasonable.

GregAC commented 1 year ago

There was some discussion on this issue back in 2020 on the tech-debug list: https://lists.riscv.org/g/tech-debug/topic/access_memory_abstract/75549024

The summary from that seems to be PMP bypass within debug mode might not be spec compliant but probably should be and implementing it shouldn't be an issue.

The latest (non yet ratified) build of the debug spec (https://raw.githubusercontent.com/riscv/riscv-debug-spec/14a8d628e1fb736043eb54e0596adddb9717f0de/riscv-debug-stable.pdf) does have some references to PMP

It introduces a new 'relaxedpriv' bit to the abstract control and status DM register. When set PMP checks are relaxed for program buffer execution and abstract memory accesses.

It also notes in A.2, the section describing the execution based implementation that we use

Note that for debug to be possible, the PMP must not disallow fetches, loads, or stores in the address range associated with the Debug Module when the hart is in Debug Mode

My feeling is doing an implementation that ignores PMP in Debug mode would be reasonable for us. Strictly it's a spec violation for v0.13.2 and would also not comply with the v1.0 draft I linked (the relaxedpriv bit would be hard-wired to 0, where it should be 1) but it is important the debug module address range isn't subject to PMP checks in debug mode and rather than having that as a hard-coded special region (or relying on software to carefully not close it off) simply disabling all PMP checks in debug mode would be a reasonable choice.

Perhaps we assume the v1.0 will continue to have the relaxedpriv bit where it is and tie that to 1 in Ibex even though it's not part of the ratified v0.13.2 spec?

We do need to consider the security implications though note that anything that has access to debug has access to the system bus access which isn't subject to PMP anyway. It could introduce a new fault injection vulnerability into Ibex (glitch debug enable going into the PMP check to disable PMP checks) but lockstep will protect against it. Plus there are existing fault injection targets in the PMP checking that will allow much the same thing.

@cdgori please note the above the paragraph and let us know of any comments you may have.

dmcardle commented 1 year ago

Neat, thanks for digging that up, Greg! Pragmatically, I'll point out that we found a workaround for our use case in #15024, so this particular sharp edge of the debug module is not currently blocking tests that run in SRAM.

alphan commented 1 year ago

@dmcardle let's talk about this offline today. I need to understand this better to see if this would affect our ability to debug ROM. We should make sure that ePMP changes in ROM do not cause the same kind of behavior.

GregAC commented 1 year ago

I'll point out that we found a workaround for our use case in https://github.com/lowRISC/opentitan/pull/15024, so this particular sharp edge of the debug module is not currently blocking tests that run in SRAM.

I could imagine some ROM ext bug that messes up the PMP configuration (e.g. in the final stages before handing over to tock or in early tock boot) that would nuke the ability to debug. You could live without the debugger (and boot stages will hopefully be pretty reproducible in sim) but it would make life more awkward.

tjaychen commented 1 year ago

@GregAC would it be a lot of design / verification overhead to add that feature now? My feeling (we should ask @alphan to confirm) is that JTAG debug can't be used most of the time in the field anyways (we will be in PROD state).

So the area where this is useful is early during bring up / validation and early stages of manufacturing. During this phase, if we happen to screw up the pmp in software stages after ROM, we can always bootstrap to get something "right" in. By the time we are no longer able to easily bootstrap, we also won't be able to use the JTAG anyways, so the concern of accidentally nuking ourselves becomes less significant.

The early stages of manufacturing also prevent ROM from executing very far, further reducing the chance that our software might accidentally shoot itself in the foot.

GregAC commented 1 year ago

@GregAC would it be a lot of design / verification overhead to add that feature now?

The RTL change itself is pretty trivial, DV wise I guess we'd need to introduce some debug mode entry into tests that also test PMP to ensure there's no issue where PMP checks end up remained disabled a cycle or two after the dret. Though due to the triviality of the implementation (feed debug mode signal into the PMP block and factor that into the final check to suppress a failure) I'd be fairly confident of no issues. Plus in practical terms I can't imagine an attack scenario where the attacker has control over instructions executed immediately out of debug mode and can cause debug mode entry/exit but doesn't have access to debug functionality.

Still given timescale perhaps we're best off leaving things here, though I think is is a change I'd like in Ibex long term.

alphan commented 1 year ago

@tjaychen yes, JTAG debug can only be used when debugging is enabled but it's a very valuable tool for ES.

@GregAC I agree with holding off the HW change for now. I think we should understand the underlying issue better first: Was it because of the way we clobbered other pmp regions and ended up breaking the 13th region that opens up the rv_dm region or something else, e.g. rv_dm always uses a specific pmp region? I was under the impression that something more mysterious was going on but this was before reading this comment from @tjaychen. The reason I'm being cautious is because we should make sure that subsequent ePMP configuration changes in ROM should not hinder our debugging ability.

@dmcardle can you confirm that we can debug as long as we preserve the value of the 13th region and the exact region that we open up is not relevant? If that's the case, I think we can conclude that this was just due to us clobbering the 13th region. Otherwise we might have to revisit our ePMP configuration.

tjaychen commented 1 year ago

is this one okay to close this now since we've identified the cause?

alphan commented 1 year ago

is this one okay to close this now since we've identified the cause?

I think so but it would be good to hear from @dmcardle. @dmcardle can you read my previous comment and confirm that this was just due to the order in which we configured ePMP?

dmcardle commented 1 year ago

@dmcardle can you confirm that we can debug as long as we preserve the value of the 13th region and the exact region that we open up is not relevant? If that's the case, I think we can conclude that this was just due to us clobbering the 13th region. Otherwise we might have to revisit our ePMP configuration.

Confirmed! I inserted a few steps into the GDB test sram_program_fpga_cw310_test before the existing PMP setup. This variant of the test sets up the unused 12th region with values from the 13th region, then zeroes out pmp13cfg. This does not prevent the GDB script from making further writes to pmpcfg registers, which involves the debug module's program buffer. I also tried a variation where I also clobbered the pmp12cfg register, and unsurprisingly, the test failed.

diff --git a/sw/device/examples/sram_program/BUILD b/sw/device/examples/sram_program/BUILD
index bbe7f68a1..0cfba18f5 100644
--- a/sw/device/examples/sram_program/BUILD
+++ b/sw/device/examples/sram_program/BUILD
@@ -40,7 +40,7 @@ opentitan_ram_binary(
 opentitan_gdb_fpga_cw310_test(
     name = "sram_program_fpga_cw310_test",
     timeout = "short",
-    exit_success_pattern = "sram_program\\.c:47\\] PC: 0x1000208c, SRAM: \\[0x10000000, 0x10020000\\)",
+    exit_success_pattern = "sram_program\\.c:47\\] PC: 0x100020e0, SRAM: \\[0x10000000, 0x10020000\\)",
     gdb_script = """
         target extended-remote :3333

@@ -101,6 +101,20 @@ opentitan_gdb_fpga_cw310_test(

         echo :::: Configure the PMP unit.\\n
         monitor reg pmpcfg0
+
+        # pmpcfg3 should be 0x00009f00
+        # pmpaddr13 should be 0x000041ff
+        monitor reg pmpcfg3
+        monitor reg pmpaddr13
+
+        # Set up region 12 with values from 13.
+        monitor reg pmpcfg3 0x00009f9f
+        monitor reg pmpaddr12 0x000041ff
+
+        # Clobber region 13
+        monitor reg pmpcfg3 0x0000009f
+        monitor reg pmpaddr13 0
+
         # Write "L NAPOT X W R" to pmp{0,1,2,3}cfg in pmpcfg0. Crucially, this
         # value is no less permissive than whatever the current value is.
         monitor reg pmpcfg0 0x9f9f9f9f

(The change to exit_success_pattern is irrelevant to this discussion, but deserves a separate PR.)

alphan commented 1 year ago

Thanks @dmcardle!