mirjanastojilovic / RDS

FPGA routing delay sensors for effective remote power analysis attacks
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

CPA attack key rank not converging to 0 #1

Open yswntht opened 1 month ago

yswntht commented 1 month ago

Thanks a lot for open-sourcing the verilog source code and corresponding bitstreams. I followed the instructions for the basys3 board trace collection and performed the CPA using the provided scripts. However, I observed that the key rank after the CPA attack is more or less constant -- I should see a decreasing trend of key rank when number of traces are increased. I'm not sure what could be the issue for this behavior? can you please suggest some pointers to debug this.

Sequence of the steps I followed and my observations for each step:

  1. program the basys3 board with the provided bitstream (temperature_experiments.bit) (able to program the board, and made sure that SW15 is set to correct position).
  2. start the trace collection. I used the following instruction to collect 40,000 traces: ./interface -k 0 -pt 1 -t 40000 -s -d traces_40k/ (trace collection process started after the sensor calibration. I see the following prompt suggesting sensor is calibrated and trace collection is about to start. 09C528D4FFFD1DB73BAB93DF5821B9F5max 109 best idc found 1 62 Trace : 0) ( trace collection took about 4hours without any errors. snapshot of the generated files. ) image
  3. launch the attack with following command: python3 launch_attack.py -k e07f16bdb9e50346a2277cd382774270 -t ../basys3/sw/bin/traces_40k/sensor_traces_40k.csv -c ../basys3/sw/bin/traces_40k/ciphertexts.bin -nt 40000 -ns 256 -ss 1000 -o /home/ext/fpga-sensors/results-rds-40k (attack was quite fast. Thanks to your scripts leveraging the GPU).
  4. observations on the files generated post-attack: keyrank_results.csv (I noticed that key rank is not improving with more traces. Attaching the results file and log files for reference.)

keyrank_results.csv log.txt log_errors.txt

What do you think could be the issue? any pointers are helpful.

Thanks.

davidspielmann commented 1 month ago

Hello!

Thank you very much for looking into our code and opening that issue.

We suggest the following actions to pinpoint the exact error:

  1. Visualize one (or a few) traces that you have recorded (this would be one row in the CSV file). Have a look at Figure 8 in our paper, where we show how traces recording AES encryptions look. When you visualize your traces, you should see a similar pattern: spikes in the negative direction of the y-axis. If the sensor output stays constant all the time and does not change, we know that the issue lies in the acquisition of the traces. One potential issue could be that the sensor is not correctly calibrated.

  2. Each run of the main-CPA program appends the data to the final_kr/xxx.txt files (it does not delete the content of the previous runs but adds the content of the new run on top of it). Therefore, if the final_kr folder is not purged before every run (and the xxx.txt files deleted), the subsequent runs might process incorrect data.

  3. A typical mistake is to confuse the master key with the last round key. For trace acquisition, key K1 from our paper is used. For the attack on the GPU, one needs the corresponding last round key (because the correlation is done with the last round key). However, it seems that you used the correct keys when executing the commands.

Please try the points above and get back to us so that we can guide you further!

davidspielmann commented 1 month ago

I just thought of another possible issue that you might want to check first. In your case, the TRACE_FILE argument is set to a CSV file, but according to our README, it should be a binary file. It seems you may need to convert the .csv file into a .bin file first.

Please let us know so we can improve our README!

Hello!

Thank you very much for looking into our code and opening that issue.

We suggest the following actions to pinpoint the exact error:

1. Visualize one (or a few) traces that you have recorded (this would be one row in the CSV file). Have a look at Figure 8 in our paper, where we show how traces recording AES encryptions look. When you visualize your traces, you should see a similar pattern: spikes in the negative direction of the y-axis. If the sensor output stays constant all the time and does not change, we know that the issue lies in the acquisition of the traces. One potential issue could be that the sensor is not correctly calibrated.

2. Each run of the main-CPA program appends the data to the final_kr/xxx.txt files (it does not delete the content of the previous runs but adds the content of the new run on top of it). Therefore, if the final_kr folder is not purged before every run (and the xxx.txt files deleted), the subsequent runs might process incorrect data.

3. A typical mistake is to confuse the master key with the last round key. For trace acquisition, key K1 from our paper is used. For the attack on the GPU, one needs the corresponding last round key (because the correlation is done with the last round key). However, it seems that you used the correct keys when executing the commands.

Please try the points above and get back to us so that we can guide you further!

yswntht commented 1 month ago

Thank you for the pointers. You're right about the TRACE_FILE format. I somehow missed it. launch_attack.py expects the format to be .bin. I will try these pointers and get back to you.

Also, I did a quick check in Sakura-X folder. SW scripts there implicitly handle creating traces in bin format. That is missing in SW scripts of Basys3.

davidspielmann commented 1 month ago

I apologize for the inconsistencies in the README. I will update it once we have fully identified and resolved all your issues.

One quick solution is to implement a Python script that handles the conversion from CSV to binary in the case of Basys. If you develop this script, I would greatly appreciate a pull request so that I can add it to the repository for the benefit of future users.

Looking forward to your feedback.

yswntht commented 1 month ago

Using the binary format fixed the issue. I could now see the key rank progressing, eventually converging to zero. Attaching the key rank results and the updated Python script that can handle both CSV and BIN formatted files. Regarding the pull request, please commit from your side, as these are minor changes.

keyrank_results.csv convert_traces.py.zip

Can you keep the issue open for a few more days? Next week, I will try with Sakura-X board.

davidspielmann commented 1 month ago

Great to hear that using the binary format resolved the issue! By the way, it is great to see that your keyrank results in the CSV align perfectly with our results in Figure 18 of the paper.

Thank you for sharing your Python script. I will add it to the repository soon and will update the ReadMe accordingly.

I will keep the issue open. Please do keep us updated on your experiments with the Sakura-X board next week. Your feedback is really valuable. I am looking forward to hearing from you again.

yswntht commented 1 month ago

Hi, I tired to collect the traces from Sakura-X board using one of the provided bitstream (RDS128_M2_AES_M1.bit) with RDS sensor. I see Calibration command failed! in terminal after executing ./FTDexampleAES -k 2 -uk 7d266aecb153b4d5d6b171a58136605b -pm 1 -t 100000 -d traces/experiment_6_1 -c 4 -s 128 -r 128. Attaching more details in the log.txt

log.txt

Sequence of steps I followed:

  1. power the board using USB on the host PC. I use Digilent JTAG-HS3 to program the board (connected to another USB port on host PC).
  2. First, program control FPGA with the provided bitstream. Then, program main FPGA with RDS128_M2_AES_M1.bit (so skipping step 6 from Github instructions)
  3. Skipping step 7 from Github instructions because I use Digilent JTAG-HS3.
  4. Made sure I have (liblibftd2xx.so) at /usr/local/lib to meet the dependency. make is successful.
  5. Executed two .sh in Debug directory: first setupFTD.sh , then unload_sio.sh.
  6. created traces directory
  7. collect the traces with ./FTDexampleAES -k 2 -uk 7d266aecb153b4d5d6b171a58136605b -pm 1 -t 100000 -d traces -c 4 -s 128 -r 128 (here I repeatedly see calibration command failed)

    Can you please check if I'm missing any of steps in setting up the board? For example, is it OK to program main FPGA with RDS128_M2_AES_M1.bit instead of locally generating the bitstream?

Thanks.

davidspielmann commented 3 weeks ago

Overall, your sequence of steps makes sense. It is perfectly fine to program the main FPGA with a provided bitstream (e.g., the RDS128_M2_AES_M1.bit). Therefore, I do not see any obvious mistake you made. Here are some general thoughts that may help:

I hope the points above help you in solving the issue. Please get back to us once you have tried them, so we can provide further assistance if needed.

yswntht commented 3 weeks ago

Thank you for the pointers. I tried them -- issue persists. Then, it occurred to me that setup_device() in Sasebogii.c might not be handling the device selection correctly? Because, I see JTAG-HS3 (210299AD0897) as Device 3 in the terminal output, but Sasebogii.c defines B as 1? instead it has to be 2 right?

//terminal output
Available devices 3
Device 0 Serial Number - FT6AQDBPA
Device 1 Serial Number - FT6AQDBPB
Device 2 Serial Number - 210299AD0897
/// from Sasebogii.c, lines 14 to 25.
#define A 0
#define B 1

FT_HANDLE* sasebo_init() {
  FT_HANDLE* handle = calloc(1, sizeof(FT_HANDLE));

  if(setup_device(B, handle) == EXIT_FAILURE) {
    return NULL;
  }

  return handle;
}

I changed to following:

/// from Sasebogii.c, line 15

#define B **2**

with above update, ./FTDexampleAES -k 2 -uk 7d266aecb153b4d5d6b171a58136605b -pm 1 -t 100000 -d traces -c 4 -s 128 -r 128 exits with write partially done! message.

Program configuration:
    - key mode: 2
    - key used: 0x7d266aecb153b4d5d6b171a58136605b
    - plaintext mode: 1
    - number of traces: 100000
    - osciloscope enabled: 0
    - calibration mode: 4
    - sensor enabled: 1
    - starting sensor sample: 0
    - number of sensor samples: 128
    - idc_idf array used: 0xffffffffffffc000000fc0000
    - output path: traces

Available devices 3
Device 0 Serial Number - FT6AQDBPA
Device 1 Serial Number - FT6AQDBPB
Device 2 Serial Number - 210299AD0897

Type: 6
ID: 67330064
LocId: 4161
Flags: 2
Serial Number: FT6AQDBPA
Description: USB <-> Serial Converter A
Handle: (nil)
Type: 6
ID: 67330064
LocId: 4162
Flags: 2
Serial Number: FT6AQDBPB
Description: USB <-> Serial Converter B
Handle: (nil)
Type: 8
ID: 67330068
LocId: 262
Flags: 2
Serial Number: 210299AD0897
Description: Digilent USB Device
Handle: (nil)
succesfully opened device 2, handle is 0x555e400794c0
Encryption mode set to 0
write partially done!

Can you please confirm if B is supposed to be ID 2?

Thanks.

davidspielmann commented 3 weeks ago

I doubt that the change solves the issue, because I have never seen "write partially done" during a successful run. While I am not entirely certain, I do not recall seeing three devices when I ran the experiments. Is there any other device connected? Maybe it is worth trying to remove all devices except the main FPGA.

Another thing that comes to my mind: maybe you need to change something in the setup files (unload_sio.sh & setupFTD.sh) to reflect the B you changed in the source code?

yswntht commented 3 weeks ago

You're right, I should not be seeing "write partially done". I made a couple changes to the flow and managed to get sakura-x working. As a trail run I collected 100 traces (attached below).

log100traces.txt

Following are the steps:

  1. changed value of B to 0.
//line 15, Sasebogii.c
#define B 0
  1. make sure FTDI drivers are correctly installed. HS3 usb-config is idVendor=0403, idProduct=6014. Sakura-X usb-config is idVendor=0403, idProduct=6010. After installing the drivers, I noticed that rules /etc/udev/rules.d/ got updated with correct permissions.
  2. I followed steps 1 to 6 same as your Github instructions.
  3. close hw_sever of Vivado (just as precaution so that no ports are occupied after programming control and main FPGAs)
  4. Then, I disconnected JTAG-HS3 from the main FPGA. So, at this point only connection between host pc and Sakura-x board is the power cable.
  5. execute (1) setupFTD.sh (2) unload_sio.sh
  6. I think step 7 in your instructions has to be updated. Either use setup_device(A, handle) or update #define B 0 when HS3 is used as a programming cable.
  7. create traces directory and run ./FTDexampleAES -k 2 -uk 7d266aecb153b4d5d6b171a58136605b -pm 1 -t 100000 -d traces -c 4 -s 128 -r 128

Based on this, I presume that JTAG-HS3 is used only for programming control and main FPGAs while Sakura-X's onboard FTDI chip (channel A) is being used for transferring sensor values to host pc. What do you think?

If approach is correct, I will collect more traces and try CPA attack.

davidspielmann commented 3 weeks ago

Great to hear that you managed to collect traces! I had a look at the file you attached, and it is exactly the same output I saw during a successful run. So, I suggest you go ahead and collect more traces and perform a CPA attack.

I will have a look at point 7, thanks. I think you are right, the JTAG-HS3 is used for programming the control/main FPGA, while the FTDI chip is used for transferring sensor values.

yswntht commented 2 weeks ago

Hi David,

Last few days I was working on Alevo board. Unfortunately, at my university we do not have U200, but only U250. I was trying to generate the .xclbin for U250, but my build fails at impl stage with ERROR: No valid objects found. Please check the following error when I run make impl_rds. I updated the platform in Makefile to xilinx_u250_gen3x16_xdma_4_1_202210_1:

WARNING: [Timing 38-3] User defined clock exists on pin level0_i/level1/level1_i/ulp/ss_ucs/inst/aclk_kernel_00_hierarchy/clkwiz_aclk_kernel_00/inst/CLK_CORE_DRP_I/clk_inst/mmcme4_adv_inst/CLKOUT0 [See /home/yaswanth/RDS/alveo/tcl/_x/link/vivado/vpl/output/_user_impl_clk.xdc:3] and will prevent any subsequent automatic derivation of generated clocks on that pin. If the user defined clock specifies '-add', any existing auto-derived clocks on that pin are retained.
WARNING: [Vivado 12-508] No pins matched 'level0_i/ulp/AES_SCA_kernel_1/U0/sensors/sensor_gen[*].sensor/tdc0/sensor_o_regs[*].obs_regs/D'.
ERROR: [VPL_TCL 101-2] ERROR: [Vivado 12-4739] set_false_path:No valid object(s) found for '-to [get_pins {level0_i/ulp/AES_SCA_kernel_1/U0/sensors/sensor_gen[*].sensor/tdc0/sensor_o_regs[*].obs_regs/D}]'.
Resolution: Check if the specified object(s) exists in the current design. If it does, ensure that the correct design hierarchy was specified for the object. If you are working with clocks, make sure create_clock was used to create the clock object before it is referenced.
ERROR: [VPL_TCL 101-3] sourcing script /home/yaswanth/RDS/alveo/tcl/_x/link/vivado/vpl/scripts/impl_1/_full_opt_pre.tcl failed
INFO: [Common 17-206] Exiting Vivado at Mon Jul  1 23:00:12 2024...

Have you seen this error with U200? At first I thought this could be due to xdc file (mismatch in pin mappings b/w U200 and U250) but then I realised that there is no user specified xdc here. I checked the logs for synthesis, no critical warnings or errors. There are a few warnings though -- I think this is typical with the tool, so ignoring them temporarily. There are also few nets that the tool removes (I'm not sure if this is OK? can you please check the attached warnings from implementation and synthesis) . I'm using Vitis 2022.2 with XRT 2.14.384. There are a couple of things I want to try but before that I want to take your input on this error. (e.g, constraints on pblocks with X and Y coordinates. These are set for U200, but I don't know they would be valid still for U250.)

impl_warning.txt synth_warnings.txt

Thanks.

davidspielmann commented 2 weeks ago

Given that we used a different board and vitis version, this might be the reason why there are incorrect constraint paths.

First of all, please note that in the case of the Alveo board, we did not use xdc files for the constraints (in contrast to Sakura and Basys). Instead, we add constraints by sourcing tcl scripts, see https://github.com/mirjanastojilovic/RDS/tree/main/alveo/constraints. In that case, if the constraint fails, the entire flow stops (that means, the entire implementation process).

One idea would be the following: comment the failing constraint, run the implementation and get the new path, change it in the constraint tcl, and re-run the implementation.

As I said, it is difficult to provide support for boards we did not use, but I hope my answer still gives you ideas where to start debugging.