Nano glitch too long - Githubissues

alex-dewar commented 1 year ago

https://forum.newae.com/t/chipwhisperer-nano-glitch-issue/3485/4

pod2g commented 1 year ago

Hi Alex.

I have taken a look at https://github.com/newaetech/chipwhisperer/blob/develop/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c and this seem to be optimisable... There is C code embedded inside assembly...

Wouldn't it be possible to generate the assembly code when setting up the glitch offset/length? And replace the for loops with generated NOPs?

So that there's actually no instruction when scope.glitch.repeat is 1 in between the pin state changes? And a single NOP when 2, etc.

Maybe a limit to scope.glitch.repeat to maximum 128 for example so that the generated payload is not too large, I don't think there's an actual use case for this long of a glitch anyway?

This is just a proposition. But seeing this code, this is understandable that we end up with a >150ns glitch in the end.

pod2g commented 1 year ago

I have commented out the for(unsigned int i = glitch_width_cnt; i != 0; i--); loop, basically hardcoding the glitch length to the minimum possible.

On the scope, it now lasts 40ns instead of the original 150ns.

Posting a picture on the forum for reference.

Now SOLN_Fault 2_1B - Introduction to Voltage Glitching with CWNano still doesn't succeed, but at least I now have a "normal count" > 0.

alex-dewar commented 1 year ago

Yeah, this should be possible. We actually do something similar with the ChipSHOUTER, so I can take a look at that.

EDIT: After to talking to Colin a bit, the code generation might not be easily achievable. You'd want to run the code from SRAM, but SRAM is much slower to run code from due to hardware optimizations targeting flash.

rlangoy commented 1 year ago

Hi @alex-dewar on 5 Aug 2020 it seems that you sucessfully runned the voltage glitching example for Courses_fault101_SOLN_Fault 2_1B - Introduction to Voltage Glitching with CWNano using the CWNANO firmware HW Version 0.11.0. Is there any way I could reproduce this result by using an older release like the V5.4 ?

pod2g commented 1 year ago

Yeah, this should be possible. We actually do something similar with the ChipSHOUTER, so I can take a look at that.

EDIT: After to talking to Colin a bit, the code generation might not be easily achievable. You'd want to run the code from SRAM, but SRAM is much slower to run code from due to hardware optimizations targeting flash.

For reference, here is what the for-loop for(unsigned int i = glitch_width_cnt; i != 0; i--); translates to:

.text:004004D4                 LDR     R3, =glitch_width_cnt
.text:004004D6                 LDR     R3, [R3]
.text:004004D8 i = R3                                  ; unsigned int
.text:004004D8                 CBZ     R3, loop.end
.text:004004DA
.text:004004DA loop                                    ; CODE XREF: cwnano_glitch_insert+A4↓j
.text:004004DA                 SUBS    R3, #1
.text:004004DC i = R3                                  ; unsigned int
.text:004004DC                 BNE     loop
.text:004004DE loop.end                                ; CODE XREF: cwnano_glitch_insert+A0↑j

This could at least be optimized by moving the 2 LDRs earlier, in the initialization part.

pod2g commented 1 year ago

Following is a patch proposal. This made SOLN_Fault 2_1B - Introduction to Voltage Glitching with CWNano succeed!

I got 14 successes with these parameters:

gc.set_range("repeat", 1, 1)
gc.set_range("ext_offset", 1, 500)

See some scope captures of the voltage glitch in the forum:

with scope.glitch.repeat = 1: ~60ns with scope.glitch.repeat = 2: ~60ns with scope.glitch.repeat = 3: ~80ns with scope.glitch.repeat = 4: ~120ns with scope.glitch.repeat = 5: ~180ns

I initially tried with the original glitch_width_case system but results were all over the place, with for example scope.glitch.repeat = 3 being far shorter than scope.glitch.repeat = 1...

diff --git a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
index d9bac12d..b73067b0 100644
--- a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
+++ b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
@@ -86,8 +86,7 @@ static uint32_t glitch_offset_cnt;
 /* Configure the glitch code, must be called before calling insert */
 void cwnano_setup_glitch(unsigned int offset, unsigned int length)
 {
-   glitch_width_cnt = length / 3;
-   glitch_width_case = length % 3;
+   glitch_width_cnt = length;

    glitch_offset_cnt = offset / 3;
    glitch_offset_case = offset % 3;
@@ -102,13 +101,16 @@ void cwnano_glitch_insert(void)
 {
    if (glitch_width_case | glitch_width_cnt){
        __disable_irq();
-       asm("push {r5-r6}");
+       asm("push {r4-r6}");
        asm volatile(
         "mov   r5,     #0x40000000\n\t"
         "orr   r5, r5, #0x000e0000\n\t"
         "orr   r5, r5, #0x00000e00\n\t"
         "movs  r6,     #1\n\t"
-        );
+        "mov   r4,     %0\n\t"
+        : : "r" (glitch_width_cnt-1)
+        :
+                );

        /* The following is very hacky, but works for now (TM). Basically:
@@ -153,37 +155,19 @@ void cwnano_glitch_insert(void)
                asm volatile("isb");
                break;
        }
-        
-        switch(glitch_width_case){
-            case 0:    
-                asm volatile("isb");
-                asm volatile("str  r6, [r5, #48]");
-                for(unsigned int i = glitch_width_cnt; i != 0; i--);
-                asm volatile("str  r6, [r5, #52]");
-                asm volatile("isb");
-                break;
-                
-           case 1:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #48]");
-               for(unsigned int i = glitch_width_cnt; i != 0; i--);
-               asm volatile("dsb");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;

-           case 2:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #48]");
-               for(unsigned int i = glitch_width_cnt; i != 0; i--);
-               asm volatile("str   r6, [r5, #48]");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-        }       
-       asm("pop {r5-r6}");
+       asm volatile("isb");
+       asm volatile("str   r6, [r5, #48]");
+       asm volatile("loop0:    cbz  r4, loop0.end");
+       asm volatile("          subs r4, #1");
+       asm volatile("          bne  loop0");
+       asm volatile("loop0.end:");
+       asm volatile("str   r6, [r5, #52]");
+       asm volatile("isb");
+ 
+       asm("pop {r4-r6}");
        __enable_irq();
    }
 }

-#pragma GCC pop_options
\ No newline at end of file
+#pragma GCC pop_options

colinoflynn commented 1 year ago

Very cool thanks, I'll see if @alex-dewar has any concern, but if it works it looks much more sane (no more crazy case). I do remember there being some issue with the glitch widths being "out of order", though from memory I had tuned that at one point (I wonder if indeed the compiler or similar changed, and because it was done in C code and not low-level ASM it wasn't stable).

We can apply this, but if you want to do a "real" PR then you'll get that sweet github credit ;-)

alex-dewar commented 1 year ago

@pod2g Ended up working on this at the same time as you and ended up just doing the whole function in asm and removing the offset case as well.

I'm wondering if either the glitch transistor got swapped out at some point, or if there's a difference between batches, as I don't see any effect until I reach a repeat of 3. I can see the gate of the transistor changing, so I'm guessing the glitch is just too short to have any effect.

@rlangoy I'm guessing this is a hardware difference, not anything to do with firmware, as the glitch code hasn't changed since the Nano was released.

pod2g commented 1 year ago

I'm wondering if either the glitch transistor got swapped out at some point, or if there's a difference between batches, as I don't see any effect until I reach a repeat of 3. I can see the gate of the transistor changing, so I'm guessing the glitch is just too short to have any effect.

I wonder if you did also remove the division by 3 in the cwnano_setup_glitch function? Sorry to ask, you probably did, but sometimes simple issues can be overlooked.

alex-dewar commented 1 year ago

Yeah, I removed the divide by 3. I can see the gate of the transistor going high (for ~20ns IIRC at a repeat of 1).

pod2g commented 1 year ago

The MOSFET on my board is marked "337", I don't think it is a DMN3200U-7 (referenced on the schematic). DMN3200U-7 should be marked 32N, N being the fabrication year AFAIK.

According to the documentation, timings are pretty high for this particular MOSFET:

Turn-On Delay Time: 40.2ns
Turn-On Rise Time: 43ns
Turn–Off Delay Time: 471ns!! (time drain current drop below 90% of the load current)

That could totally explain what you are experimenting.

What is the MOSFET on your board?

colinoflynn commented 1 year ago

The 337 is the FDV337N, one of the subs we've had to do. I checked and the latest batch is using this too (we try to keep subs consistent once we do it if possible). The simple N-Channel MOSFET though has been a problem for past couple years, so more subs are plausible. It looks like that one is faster than the DMN3200U-7 we were originally using too, which would align with what you're seeing!

pod2g commented 1 year ago

Hi Colin, thanks a lot for the precisions on the substitutions!

According to the FDV337N documentation:

Turn-On Delay Time: 4ns
Turn-On Rise Time: 10ns
Turn–Off Delay Time: 17ns

That's quite a big improvement over DMN3200U-7 !! and can totally explain the difference in behaviour. Now the original code makes more sense, the 2 loads before the loop probably did not make a difference with DMN3200U-7, or rather, they would allow it to be ready to work.

The fact that @alex-dewar 's variant does not have the time to glitch at repeat=1 is probably not big of deal, considering the original Jupyter notebook does a loop from 1 to 3 originally. You could maybe bump it to 5, just to be sure.

rlangoy commented 1 year ago

I did a simple simulation of the VDD glitching using LT-Spice to compare the difference using DMN3200U vs the FDV337N (I am not sure about the quality of the DMN3200U model...) But this was my results..

Result of singe 60ns Pulse

alt text

Result of multiple 60ns Pulse with 50% dutycycle

alt text

The models and simulations could be found at : https://github.com/rlangoy/cw_nano_glitch_sim

Hopes this input could helpful

alex-dewar commented 1 year ago

The new firmware for the Nano is pushed (0.63). Let me know if that works for you

rlangoy commented 1 year ago

Hi @alex-dewar unfortunately the new firmware did not work for me. The target on my cw-nano is a stm32f04 2f6p6 and not the original stm32f03xxx.

pod2g commented 1 year ago

Hi @alex-dewar,

I tried my best as well, no luck with your latest code. I never get a success. However, if I revert to my loop proposal instead, it works.

Here is an in between proposal that functions as well:

diff --git a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
index 7050128c..c468793b 100644
--- a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
+++ b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
@@ -108,8 +108,7 @@ void cwnano_glitch_insert(void)
                                "movs   r6,     #1\n\t"

                                "ldr r3, %[offset_cnt]\n\t"
                                "ldr r4, %[width_cnt]\n\t"
                                "isb\n\t"
                        "OFFLOOP:\n\t"
                                "subs r3, #1\n\t"
@@ -117,9 +116,11 @@ void cwnano_glitch_insert(void)
                                "isb\n\t"

                                "str r6, [r5, #48]\n\t" //gpio high now
+                               "cbz  r4, WID0LOOP.end\n\t"
                        "WID0LOOP:\n\t"
-                               "subs r4, #0x01\n\t" //1 is the minimum for width_cnt
+                               "subs r4, #1\n\t"
                                "bne WID0LOOP\n\t"
+                       "WID0LOOP.end:\n\t"
                                "str r6, [r5, #52]\n\t"
                                "isb\n\t"

@@ -228,4 +229,4 @@ void cwnano_glitch_insert(void)
 }

 #pragma GCC pop_options
-#endif
\ No newline at end of file
+#endif

@rlangoy: I also have the STM32F042F6P6.

Note: I used these parameters for testing:

gc.set_range("repeat", 1, 2)
gc.set_range("ext_offset", 1, 500)

@colinoflynn, @alex-dewar: with all the testing and time lost on this, I hope you guys will send us a free kit ;-) j/k

~pod

alex-dewar commented 1 year ago

Interesting. I'd guess our loops differ by a clock cycle and that's making the difference. Out of curiosity, do you see successes with both a width of 1 and 2 with your code? I'm guessing my original loop with width 1 is ~2 cycles and both width 1 and 2 for you are ~3 cycles.

You can make a pull request with the changes if you'd like, or I can patch your changes in.

Thanks for your help with this. We do generally like to send some thank you's out in situations like these, but parts shortage are still rough, so I can't guarantee anything.

pod2g commented 1 year ago

Yeah, it's not easy to understand the issue. Your version is maybe either too fast or too slow when width=1 and then width=2.

I have another proposal, the dynamically generated assembly version, with the limitation of the size of the offset and width. It seems to be the most precise one in terms of possible adjustments (time between offset=n and offset=n+1 as well as width=m to width=m+1). There's still a huge jitter for the offset of about 500 ns max for all versions, probably because of the interrupt accuracy itself.

I think this version is the fastest one as well, even though SRAM is not cached. For example, if I compare the glitch position on the scope from the GPIO4 trigger to the glitch (offset=500), the time is shorter (~20%) for the generated assembly version.

Edit: this ^^ was wrong, see measurements in my following post below.

I can't really tell the glitch length by scoping the transistor gate as it is affected by the transistor delay+rise time and by the target... It looks equal for width=1, but I am pretty sure there's a difference. I could probably check using a different GPIO.

What I can tell though is I have success with width=1, width=2 and to a lesser extent width=3 with this version.

Diff of this version compared to current:

diff --git a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
index 7050128c..b6b5966c 100644
--- a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
+++ b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
@@ -38,6 +38,7 @@
  */

 #include <asf.h>
+#include <string.h>
 #include "main.h"

 void delay_setup(void);
@@ -77,67 +78,102 @@ void cwnano_glitch_init(void)
    gpio_configure_pin(PIN_GLITCH_IDX, PIO_OUTPUT_0 | PIO_DEFAULT);
 }

-static uint32_t glitch_width_case;
-static uint32_t glitch_width_cnt;
+#define NANO_GLITCH_ASM
+#ifdef NANO_GLITCH_ASM

-static uint32_t glitch_offset_case;
-static uint32_t glitch_offset_cnt;
+#define GLITCH_WIDTH_MAX     128
+#define GLITCH_OFFSET_MAX   2048
+
+static uint8_t* glitch_payload = NULL;

 /* Configure the glitch code, must be called before calling insert */
 void cwnano_setup_glitch(unsigned int offset, unsigned int length)
 {
-   glitch_width_cnt = length;
-   
-   glitch_offset_cnt = offset;
-}

+/*
+72 B6                                 CPSID   I
+4F F0 80 45 45 F4 60 25 45 F4 60 65   MOV     R0, #0x400E0E00
+01 26                                 MOVS    R1, #1
+                                      <... nops ...>
+2E 63                                 STR     R1, [R0,#0x30]
+                                      <... nops ...>
+6E 63                                 STR     R1, [R0,#0x34]
+62 B6                                 CPSIE   I
+70 47                                 BX      LR
+*/

-// 0, 0 not possible
-#define NANO_GLITCH_ASM
-#ifdef NANO_GLITCH_ASM
-void cwnano_glitch_insert(void)
-{
-   if (glitch_width_cnt) {
-       __disable_irq();
+   const char asm_pre[]     = "\x72\xB6\x4F\xF0\x80\x40\x40\xF4\x60\x20\x40\xF4\x60\x60\x01\x21"; // no null bytes
+   const char asm_on[]      = "\x01\x63"; // no null bytes
+   const char asm_post[]    = "\x41\x63\x62\xB6\x70\x47"; // no null bytes
+   const size_t asm_pre_sz  = sizeof(asm_pre)-1;
+   const size_t asm_on_sz   = sizeof(asm_on)-1;
+   const size_t asm_post_sz = sizeof(asm_post)-1;

-       asm volatile(
-           // setup GPIO reg refs
-               "mov   r5,     #0x40000000\n\t"
-               "orr   r5, r5, #0x000e0000\n\t"
-               "orr   r5, r5, #0x00000e00\n\t"
-               "movs   r6,     #1\n\t"
-
-               "ldr r3, %[offset_cnt]\n\t"
-               "ldr r4, %[width_cnt]\n\t"
-
-               "isb\n\t"
-           "OFFLOOP:\n\t"
-               "subs r3, #1\n\t"
-               "bpl OFFLOOP\n\t" //branch on underflow
-               "isb\n\t"
-
-               "str r6, [r5, #48]\n\t" //gpio high now
-           "WID0LOOP:\n\t"
-               "subs r4, #0x01\n\t" //1 is the minimum for width_cnt
-               "bne WID0LOOP\n\t"
-               "str r6, [r5, #52]\n\t"
-               "isb\n\t"
-
-           : 
-           : [offset_cnt] "m" (glitch_offset_cnt), [width_cnt] "m" (glitch_width_cnt)
-           : "r3", "r4", "r5", "r6", "memory"
-       );
+   if (glitch_payload != NULL){
+       free(glitch_payload-1); // remove the thumb mode +1
+       glitch_payload = NULL;
+   }

+   if (length > 0 && length <= GLITCH_WIDTH_MAX
+       && offset >= 0 && offset <= GLITCH_OFFSET_MAX){

-       __enable_irq();
+       size_t poffsz = offset<<1;
+       size_t pglsz = length<<1;
+       size_t psz = asm_pre_sz + poffsz + asm_on_sz + pglsz + asm_post_sz;
+       uint8_t *pl = (uint8_t*) malloc(psz);
+
+       if (NULL != pl){
+           off_t poff = 0;
+
+           memcpy(&pl[poff], asm_pre, asm_pre_sz);
+           poff += asm_pre_sz;
+           for (uint32_t i = 0; i < offset; i++){
+               *(uint16_t*) (&pl[poff]) = 0xBF00; // nop
+               poff += 2;
+           }
+           memcpy(&pl[poff], asm_on, asm_on_sz);
+           poff += asm_on_sz;
+           for (uint32_t i = 0; i < length-1; i++){
+               *(uint16_t*) (&pl[poff]) = 0xBF00; // nop
+               poff += 2;
+           }
+           memcpy(&pl[poff], asm_post, asm_post_sz);
+
+           glitch_payload = pl+1; // thumb mode +1
+       }
    }
 }
+
+void cwnano_glitch_insert(void)
+{
+   if ((((uintptr_t)glitch_payload)&1) != 0) { // check for thumb mode +1
+       void (*glitch_payload_func)(void) = (void (*)(void)) (glitch_payload);
+       glitch_payload_func();
+   }
+}
+
 #else

+static uint32_t glitch_width_case;
+static uint32_t glitch_width_cnt;
+
+static uint32_t glitch_offset_case;
+static uint32_t glitch_offset_cnt;
+
+/* Configure the glitch code, must be called before calling insert */
+void cwnano_setup_glitch(unsigned int offset, unsigned int length)
+{
+   glitch_width_cnt = length / 3;
+   glitch_width_case = length % 3;
+   
+   glitch_offset_cnt = offset / 3;
+   glitch_offset_case = offset % 3;
+}
+
+
 #pragma GCC push_options
 #pragma GCC optimize ("O1")

-
 /* Insert the glitch by driving pin */
 void cwnano_glitch_insert(void)
 {
@@ -228,4 +264,4 @@ void cwnano_glitch_insert(void)
 }

 #pragma GCC pop_options
-#endif
\ No newline at end of file
+#endif

alex-dewar commented 1 year ago

Wow, amazing work with this! I'll give this a quick run just to make sure it works with the old transistor, but I doubt that'll cause any issues.

Two thoughts: first, it would be nice to keep the loop based offset, as very large offsets are still potentially useful. Second, I'd like to avoid malloc/free, as you'd avoid any issues that come with dynamic memory allocation and I don't think you're saving any memory because everything else is statically allocated anyway.

pod2g commented 1 year ago

Here are some measurements. This issue is absolutely not straightforward and I was wrong about the generated asm version being faster (at least for the width part).

I used GPIO3, connected to a 2.7K resistor to ground. Oscilloscope in persistence mode to see the jittered traces.

Current version on GitHub:

width=1: traces of 33ns, 49ns
width=2: traces of 58ns, 66ns, 74ns
width=3: traces of 83ns, 91ns, 99ns
width=4: traces of 108ns, 116ns, 124ns

Generated asm version:

width=1: traces of 41ns, 49ns, 58ns
width=2: traces of 41ns, 49ns, 58ns
width=3: traces of 58ns, 66ns
width=4: traces of 58ns, 66ns
width=5: traces of 83ns, 91ns, 99ns
width=6: traces of 83ns, 91ns, 99ns
width=7: traces of 108ns, 116ns, 124ns

All this doesn't make a lot of sense. I don't understand why the current version doesn't give any success. Maybe the fact that it misses the 41ns glitch?

pod2g commented 1 year ago

8.33ns resolution glitches with the PWM module:

diff --git a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
index 7050128c..be9f0f92 100644
--- a/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
+++ b/hardware/capture/chipwhisperer-nano/firmware/cwnano-firmware/src/cwnano_glitch.c
@@ -45,21 +45,9 @@ void pin_trigglitch_handler(const uint32_t id, const uint32_t mask);

 static volatile unsigned int glitch_enabled = 0;

-/* Handler for all PIOA events */
-void pin_trigglitch_handler(const uint32_t id, const uint32_t mask)
-{
-   if ((id == ID_PIOA) && (mask == PIN_TARGET_GPIO4_MSK)){
-       
-       /* Disable interrupt now */
-       pio_disable_interrupt(PIOA, PIN_TARGET_GPIO4_MSK);
-       
-       if(glitch_width && glitch_enabled){
-           cwnano_glitch_insert();
-           glitch_enabled = 0;
-       }       
-   }
-}
-
+static uint32_t glitch_width_cnt = 0;
+static uint32_t glitch_offset_cnt = 0;
+static uint32_t glitch_wait_cnt = 0;

 void cwnano_glitch_enable(void)
 {
@@ -69,163 +57,109 @@ void cwnano_glitch_enable(void)
    pio_handler_set(PIOA, ID_PIOA, PIN_TARGET_GPIO4_MSK, PIO_IT_RISE_EDGE, pin_trigglitch_handler);
    pio_enable_interrupt(PIOA, PIN_TARGET_GPIO4_MSK);
    glitch_enabled = 1;
+
+   PWM->PWM_DIS = PWM_DIS_CHID0;                                                // channel 0 must be disabled before cwnano_glitch_insert is invoked
 }

+#define PWM_GPIO0 IOPORT_CREATE_PIN(PIOA, 0)
+
 /* Init the glitch pin (drive low) */
 void cwnano_glitch_init(void)
 {
-   gpio_configure_pin(PIN_GLITCH_IDX, PIO_OUTPUT_0 | PIO_DEFAULT);
-}
+   PMC->PMC_PCER0               = (1<<ID_PWM);                                  // clock on the PWM module
+   PIOA->PIO_PDR                = PIO_PDR_P0;                                   // PA0 pin is not PIO anymore
+   PIOA->PIO_ABCDSR[1]         &= ~PIO_ABCDSR_P0;                               // PA0 pin should use peripheral A mode

-static uint32_t glitch_width_case;
-static uint32_t glitch_width_cnt;
-
-static uint32_t glitch_offset_case;
-static uint32_t glitch_offset_cnt;
+}

 /* Configure the glitch code, must be called before calling insert */
 void cwnano_setup_glitch(unsigned int offset, unsigned int length)
 {
-   glitch_width_cnt = length;
-   
-   glitch_offset_cnt = offset;
+   uint32_t glitch_prd;
+   uint32_t glitch_duty;
+
+   if ((length > 0 && length != glitch_width_cnt) || (offset > 0 && offset != glitch_offset_cnt)) {
+       glitch_width_cnt             = length;
+       glitch_offset_cnt            = offset;
+       glitch_duty                  = 16;                                   // we want to have the time to enable channel 0, start our wait loop, etc.
+                                                                            //  and switch off output after the glitch
+       glitch_prd                   = glitch_duty + length;
+       glitch_wait_cnt              = (glitch_prd + (glitch_prd>>1)) / 5;   // 1 full period +50% margin at 5 cycles per loop
+
+       PWM->PWM_CH_NUM[0].PWM_CMR   = PWM_CMR_CPRE_MCK;                     // use MCK, polarity=LOW
+       PWM->PWM_CH_NUM[0].PWM_CPRD  = glitch_prd;                           // period cycles
+       PWM->PWM_CH_NUM[0].PWM_CDTY  = glitch_duty;                          // duty cycles
+       PWM->PWM_OOV                 = ~PWM_OOV_OOVH0;                       // value to output (0) when override is selected
+       PWM->PWM_OS                  = PWM_OS_OSH0;                          // override: output 0, not taking care of the pwm output
+   }
 }

+#pragma GCC push_options
+#pragma GCC optimize ("O4")

-// 0, 0 not possible
-#define NANO_GLITCH_ASM
-#ifdef NANO_GLITCH_ASM
-void cwnano_glitch_insert(void)
+/* Handler for all PIOA events */
+void pin_trigglitch_handler(const uint32_t id, const uint32_t mask)
 {
-   if (glitch_width_cnt) {
-       __disable_irq();
-
-       asm volatile(
-           // setup GPIO reg refs
-               "mov   r5,     #0x40000000\n\t"
-               "orr   r5, r5, #0x000e0000\n\t"
-               "orr   r5, r5, #0x00000e00\n\t"
-               "movs   r6,     #1\n\t"
-
-               "ldr r3, %[offset_cnt]\n\t"
-               "ldr r4, %[width_cnt]\n\t"
-
-               "isb\n\t"
-           "OFFLOOP:\n\t"
-               "subs r3, #1\n\t"
-               "bpl OFFLOOP\n\t" //branch on underflow
-               "isb\n\t"
-
-               "str r6, [r5, #48]\n\t" //gpio high now
-           "WID0LOOP:\n\t"
-               "subs r4, #0x01\n\t" //1 is the minimum for width_cnt
-               "bne WID0LOOP\n\t"
-               "str r6, [r5, #52]\n\t"
-               "isb\n\t"
-
-           : 
-           : [offset_cnt] "m" (glitch_offset_cnt), [width_cnt] "m" (glitch_width_cnt)
-           : "r3", "r4", "r5", "r6", "memory"
-       );
-
-
-       __enable_irq();
-   }
-}
-#else
+   if ((id == ID_PIOA) && (mask == PIN_TARGET_GPIO4_MSK)){

-#pragma GCC push_options
-#pragma GCC optimize ("O1")
+       /* Disable interrupt now */
+       pio_disable_interrupt(PIOA, PIN_TARGET_GPIO4_MSK);

+       if(glitch_width && glitch_enabled){
+           cwnano_glitch_insert();
+           glitch_enabled = 0;
+       }
+   }
+}

-/* Insert the glitch by driving pin */
 void cwnano_glitch_insert(void)
 {
-   if (glitch_width_case | glitch_width_cnt){
-       __disable_irq();
-       asm("push {r5-r6}");
-       asm volatile(
-        "mov   r5,     #0x40000000\n\t"
-        "orr   r5, r5, #0x000e0000\n\t"
-        "orr   r5, r5, #0x00000e00\n\t"
-        "movs  r6,     #1\n\t"
-        );
-        
-        
-       /* The following is very hacky, but works for now (TM). Basically:
-       
-         1) You need the 'isb' to clear pipeline before each delay call. If you don't do that you'll see different delays
-            depending on state of pipeline. This makes the case 0/1/2 not work for example since the base delay amount
-            isn't what you expect.
-            
-         2) Due to pipeline etc you can't easily cycle count. The instruction choices have been tested on HW. Should have some
-            way of verifying this automatically still.
-            
-         3) Other attemps such as SRAM functions didn't work well.
-         
-         4) The choice of r5/r6 is just verified in debugger for now. Should actually do entire thing as assembly function
-            at some point.  
-       */
-        
-       switch(glitch_offset_case){
-           case 0:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #52]");
-               for(unsigned int i = glitch_offset_cnt; i != 0; i--);
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-           
-           case 1:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #52]");
-               for(unsigned int i = glitch_offset_cnt; i != 0; i--);
-               asm volatile("dsb");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-
-           case 2:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #52]");
-               for(unsigned int i = glitch_offset_cnt; i != 0; i--);
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-       }
-        
-        switch(glitch_width_case){
-            case 0:    
-                asm volatile("isb");
-                asm volatile("str  r6, [r5, #48]");
-                for(unsigned int i = glitch_width_cnt; i != 0; i--);
-                asm volatile("str  r6, [r5, #52]");
-                asm volatile("isb");
-                break;
-                
-           case 1:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #48]");
-               for(unsigned int i = glitch_width_cnt; i != 0; i--);
-               asm volatile("dsb");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-
-           case 2:
-               asm volatile("isb");
-               asm volatile("str   r6, [r5, #48]");
-               for(unsigned int i = glitch_width_cnt; i != 0; i--);
-               asm volatile("str   r6, [r5, #48]");
-               asm volatile("str   r6, [r5, #52]");
-               asm volatile("isb");
-               break;
-        }       
-       asm("pop {r5-r6}");
-       __enable_irq();
+   if (!glitch_wait_cnt) {
+       return;
    }
+
+   asm volatile(
+       "cpsid i                      \n\t"
+       "ldr   r0,     %[offset_cnt]  \n\t"
+       "cbz   r0,     end.offset.%=  \n\t"
+
+       "loop.offset.%=:              \n\t"
+       "subs  r0,     #1             \n\t"
+       "bne   loop.offset.%=         \n\t"
+       "end.offset.%=:               \n\t"
+       :
+       : [offset_cnt] "m" (glitch_offset_cnt)
+       : "r0", "memory"
+   );
+
+   PWM->PWM_OS                  = 0;                // clear override
+   PWM->PWM_ENA                 = PWM_ENA_CHID0;    // enable channel 0 => resets pwm counter
+
+   // this loop doesn't have to be quick, but we need to be able to predict its time in cycles
+   asm volatile(
+       "dsb                          \n\t"
+       "isb                          \n\t"
+       "ldr   r0,     %[wait_cnt]    \n\t"
+       "loop.wait.%=:                \n\t"
+       "subs  r0,     #1             \n\t"
+       "bne   loop.wait.%=           \n\t"
+       :
+       : [wait_cnt] "m" (glitch_wait_cnt)
+       : "r0", "memory"
+   );
+
+#if 0
+   // debug: output a blip after the glitch to fine tune the wait time with a scope
+   PWM->PWM_OOV                 = PWM_OOV_OOVH0;    // override: value 1
+   PWM->PWM_OS                  = PWM_OS_OSH0;      // override: output 1, not taking care of the pwm output
+   PWM->PWM_OOV                 = ~PWM_OOV_OOVH0;   // override: value 0
+#else
+   PWM->PWM_OS                  = PWM_OS_OSH0;      // override: output 0, not taking care of the pwm output
+#endif
+
+   asm volatile(
+       "cpsie i                      \n\t"
+   );
 }

 #pragma GCC pop_options
-#endif
\ No newline at end of file

pod2g commented 1 year ago

~~With the PWM, we can also change the frequency to be nearly anything between 0 and 120MHz so that the glitch length doesn't necessarily need to be a multiple of 8.33ns.~~

Reason I am saying this is that the PWM version seems great from the scope perspective. It's incredible to see those 8.33ns perfect glitches. However it does not lead any success on my board, exactly like the current version on GitHub. The issue is probably more related to the synchronous nature of this?

@alex-dewar : what do you think?

Edit: Frequency has to be Peripheral clock divided by some integer factor. I don't think this will help with the synchronous nature of the glitch.

Edit2: However, we can send a 60MHz (PWM_CDTY=1, PWM_CPRD=2) square wave for n*8.33ns and this leads to success. More in line with what the ChipWhisperer Lite does. This thing has more settings to play with than a standard GPIO.

Edit3: the successful 60MHz glitch is 250ns long, but thanks to the PWM, it's only 1.3V max, oscillating most of its time between 1 and 1.3V. 17 successes over a range of offsets 1 to 512 (1 attempt per offset).

Pretty sure this PWM version could be interesting in a lot of scenarios.

rlangoy commented 1 year ago

@pod2g I realy appreciate all your hard work :) , but unfortunately it did not work for me.. The problem seeme to me that when the glitch drops below 600mV the circuit vould reset. Adding a resistor to privide some bias helped me.. mod_glitch

By desoldering the Sj6 and adding a 5Ohm resistor between GLITCH and Measure i did manage to get som sucesses (The R12 in the schematic was not 27 Ohm , but 20 Ohm on my card..) Glitch_results This is scope output: mod_glitch_scope

alex-dewar commented 1 year ago

Based on the results you guys have seen, I think it's best if we either do something similar to the original loop, or the generated asm version. Regarding the latter, do you see any difference if you do a different single cycle operation, like an add? I'm not sure about others, but I believe nop isn't guaranteed to take any time (i.e. the CPU can skip them).

I'd agree that the PWM version definitely opens up some interesting glitch opportunities, so it'd be nice to at least keep it as an alternative. Maybe we could keep it #ifdef'd out/incomplete and mention it in the hardware docs?

pod2g commented 1 year ago

@rlangoy : absolutely great mod. Thanks for sharing! I think this adds another perspective to the research.

This as well as the 60MHz PWM test seems to tell that the length and resolution of the glitch are not the only parameters to achieve success.

What I find good about the PWM is that we can artificially control the voltage of the glitch through software.

What would be nice would be to have a PoC with all parameters (period, duty, length) accessible to the user from Python and test what is possible to achieve with that. Unsure I'll have the time to work on this though.

@alex-dewar: good point about the nop. I can try other instructions and see the result for the generated assembly version.

alex-dewar commented 1 year ago

If you'd like, I can do the USB/Python code up for the PWM glitch part.

pod2g commented 1 year ago

That would be great! Thanks for proposing.

alex-dewar commented 1 year ago

Also quick question, do you want me to test out the add version of the generated assembly as well? I should be able to get one of the nano's with the new transistor to verify that the glitch works.

pod2g commented 1 year ago

Sure! I'll report though if I get a chance to try. Thanks again!

alex-dewar commented 1 year ago

Quick update, still working on this. Turns out a precise offset is also very important here, so I ended up implementing a faster version of the original, with multiple offset cases. This seems to work for me for both the new and old transistor.

I did find something interesting: writing to the IO pin (one or more times) then doing a nop as a delay results in almost no jitter on the glitch width. Any other instructions (two nops in a row, any branching, etc.) results in the same jitter as before. Also I've found that increasing the target clock speed like we do with the Lite voltage glitching labs results in many more successful glitches.

pod2g commented 1 year ago

Hi @alex-dewar.

Is the version you're talking about already on the repo?

I'd love to test this out!

No rush though, I'm out to Hexacon for the next few days, getting back home on Sunday.

There's also the delay to glitch of approx 4 microseconds that would be nice to address (if possible?).

Do you think there's room for improvement in the interrupt handler?

Maybe a new ticket for this other subject?

Awesome news, and Thanks!

~pod

alex-dewar commented 1 year ago

Just pushed to a new branch: https://github.com/newaetech/chipwhisperer/tree/nano_glitch. Still pretty WIP, but should compile and work

Re the delay on the interrupt, it might be possible to improve performance here, but it's hard to say. Arm gives 12 cycles of interrupt latency, but that's based on a 0 wait state flash model. It's also hard to say how much of the latency comes from unnecessary/unoptimized operations. It might be interesting to look at in the future, but there's currently a lot of stuff for me that's higher priority.

alex-dewar commented 10 months ago

The above branch got merged in a while back, so closing this issue.

grasshoppper4 commented 4 months ago

Today I just updated the firmware and this issue occurs.

newaetech / chipwhisperer

Nano glitch too long #419

Result of singe 60ns Pulse

Result of multiple 60ns Pulse with 50% dutycycle