Open rogpld opened 3 years ago
Hello, thanks for making the code available.
I have noticed that when synthesizing the code generated for PQCCoprocessorNoMem there are some paths that are not properly constrained. By logic, the critical path should go through the arithmetic heavy operations, such as the two multiplications and additions/subtractions. However, the reported path goes through the samplers. Then I noticed that there is no sequential path on the combinational logic. The critical path through the samplers is about 300MHz in 28nm, as reported in the paper, but the one through the combinational logic is much slower.
Would you please elaborate on these issues?
Thanks you.
According to our experiment results, the critical path should go through the vector butterfly units rather than the samplers. The frequency could reach 300 MHz, since the data width is only 16 bit and the synopsys DesignWare can optimize the multiplications quite well.
But there is no sequential path on the butterfly units, it is only arithmetic. How did you constrained the butterfly path for optimization? And how did you configure retime to achieve 300MHz? Would you be able to provide the scripts used in Design Compiler?
I'm not using Synopsys. However, I did the following in Genus (from Cadence). I have generated the butterfly's Verilog. Given the inexistence of a sequential path in the butterfly, Genus does not optimize the arithmetic, so I have registered the inputs and outputs. In this case, constraining the path for 1GHz gives 225MHz as maximum frequency (and this is on 28nm as well). When I enable retime then the frequency is much higher, but the tool inserts several registers breaking the datapath of the butterfly and making an automatic pipeline.
I am sorry I cannot supply the synthesis script for you due to the rule of our laboratory. You are right, the output should be registered to optimize the circuit. As I said, we have achieved the 300MHz because we have adopted the multipiler IP provided by Synopsys Designware, and we choose the high-speed transisters as our standard cells.
But there is no sequential path on the butterfly units, it is only arithmetic. How did you constrained the butterfly path for optimization? And how did you configure retime to achieve 300MHz? Would you be able to provide the scripts used in Design Compiler?
I'm not using Synopsys. However, I did the following in Genus (from Cadence). I have generated the butterfly's Verilog. Given the inexistence of a sequential path in the butterfly, Genus does not optimize the arithmetic, so I have registered the inputs and outputs. In this case, constraining the path for 1GHz gives 225MHz as maximum frequency (and this is on 28nm as well). When I enable retime then the frequency is much higher, but the tool inserts several registers breaking the datapath of the butterfly and making an automatic pipeline.
Alright. So you got 300MHz in for the fastest corner case? What about retiming, did you turn it on or off?
Alright. So you got 300MHz in for the fastest corner case? What about retiming, did you turn it on or off?
No retiming.
Ok.
So, since you have agreed that the inputs and outputs should be registered, how was the experimental setup in the paper? Would you be able to provide more details? Was it registered or not? Are the reports under NDA as well?
The SRAM itself registers the inputs and outputs, since the data are read from and write to the SRAM, these data are registered by the SRAM r/w ports.
Ok.
So, since you have agreed that the inputs and outputs should be registered, how was the experimental setup in the paper? Would you be able to provide more details? Was it registered or not? Are the reports under NDA as well?
Ok, got it. So the results are from the synthesis of the whole thing.
I thought the results were taken separately, based on this code.
The SCR1 has been reported to operate at 250MHz@90nm. See this.
So when you said the critical path was through the arithmetic, how much is the overall reduction in frequency of operation?
Hello, thanks for making the code available.
I have noticed that when synthesizing the code generated for PQCCoprocessorNoMem there are some paths that are not properly constrained. By logic, the critical path should go through the arithmetic heavy operations, such as the two multiplications and additions/subtractions. However, the reported path goes through the samplers. Then I noticed that there is no sequential path on the combinational logic. The critical path through the samplers is about 300MHz in 28nm, as reported in the paper, but the one through the combinational logic is much slower.
Would you please elaborate on these issues?
Thanks you.