pq-crypto / vpqc

processor for post-quantum cryptography
14 stars 0 forks source link

What is the critical path on the design? #1

Open rogpld opened 3 years ago

rogpld commented 3 years ago

Hello, thanks for making the code available.

I have noticed that when synthesizing the code generated for PQCCoprocessorNoMem there are some paths that are not properly constrained. By logic, the critical path should go through the arithmetic heavy operations, such as the two multiplications and additions/subtractions. However, the reported path goes through the samplers. Then I noticed that there is no sequential path on the combinational logic. The critical path through the samplers is about 300MHz in 28nm, as reported in the paper, but the one through the combinational logic is much slower.

Would you please elaborate on these issues?

Thanks you.

pq-crypto commented 3 years ago

Hello, thanks for making the code available.

I have noticed that when synthesizing the code generated for PQCCoprocessorNoMem there are some paths that are not properly constrained. By logic, the critical path should go through the arithmetic heavy operations, such as the two multiplications and additions/subtractions. However, the reported path goes through the samplers. Then I noticed that there is no sequential path on the combinational logic. The critical path through the samplers is about 300MHz in 28nm, as reported in the paper, but the one through the combinational logic is much slower.

Would you please elaborate on these issues?

Thanks you.

According to our experiment results, the critical path should go through the vector butterfly units rather than the samplers. The frequency could reach 300 MHz, since the data width is only 16 bit and the synopsys DesignWare can optimize the multiplications quite well.

rogpld commented 3 years ago

But there is no sequential path on the butterfly units, it is only arithmetic. How did you constrained the butterfly path for optimization? And how did you configure retime to achieve 300MHz? Would you be able to provide the scripts used in Design Compiler?

I'm not using Synopsys. However, I did the following in Genus (from Cadence). I have generated the butterfly's Verilog. Given the inexistence of a sequential path in the butterfly, Genus does not optimize the arithmetic, so I have registered the inputs and outputs. In this case, constraining the path for 1GHz gives 225MHz as maximum frequency (and this is on 28nm as well). When I enable retime then the frequency is much higher, but the tool inserts several registers breaking the datapath of the butterfly and making an automatic pipeline.

pq-crypto commented 3 years ago

I am sorry I cannot supply the synthesis script for you due to the rule of our laboratory. You are right, the output should be registered to optimize the circuit. As I said, we have achieved the 300MHz because we have adopted the multipiler IP provided by Synopsys Designware, and we choose the high-speed transisters as our standard cells.

But there is no sequential path on the butterfly units, it is only arithmetic. How did you constrained the butterfly path for optimization? And how did you configure retime to achieve 300MHz? Would you be able to provide the scripts used in Design Compiler?

I'm not using Synopsys. However, I did the following in Genus (from Cadence). I have generated the butterfly's Verilog. Given the inexistence of a sequential path in the butterfly, Genus does not optimize the arithmetic, so I have registered the inputs and outputs. In this case, constraining the path for 1GHz gives 225MHz as maximum frequency (and this is on 28nm as well). When I enable retime then the frequency is much higher, but the tool inserts several registers breaking the datapath of the butterfly and making an automatic pipeline.

rogpld commented 3 years ago

Alright. So you got 300MHz in for the fastest corner case? What about retiming, did you turn it on or off?

pq-crypto commented 3 years ago

Alright. So you got 300MHz in for the fastest corner case? What about retiming, did you turn it on or off?

No retiming.

rogpld commented 3 years ago

Ok.

So, since you have agreed that the inputs and outputs should be registered, how was the experimental setup in the paper? Would you be able to provide more details? Was it registered or not? Are the reports under NDA as well?

pq-crypto commented 3 years ago

The SRAM itself registers the inputs and outputs, since the data are read from and write to the SRAM, these data are registered by the SRAM r/w ports.

Ok.

So, since you have agreed that the inputs and outputs should be registered, how was the experimental setup in the paper? Would you be able to provide more details? Was it registered or not? Are the reports under NDA as well?

rogpld commented 3 years ago

Ok, got it. So the results are from the synthesis of the whole thing.

I thought the results were taken separately, based on this code.

PQCCoprocessorNoMemIO

The SCR1 has been reported to operate at 250MHz@90nm. See this.

So when you said the critical path was through the arithmetic, how much is the overall reduction in frequency of operation?