Open sylefeb opened 1 year ago
I think a good way to save many cycles is as follows.
In the following code, there are many else
blocks that the only thing they do is to set next state:
module M_uart_tx (
in_data,
in_clock_counter,
out_tx_pin,
out_out_valid,
in_run,
out_done,
reset,
out_clock,
clock
);
input [7:0] in_data;
input signed [31:0] in_clock_counter;
output [0:0] out_tx_pin;
output [0:0] out_out_valid;
input in_run;
output out_done;
input reset;
output out_clock;
input clock;
assign out_clock = clock;
reg signed [31:0] _d_t1;
reg signed [31:0] _q_t1;
reg [7:0] _d___block_3_mask;
reg [7:0] _q___block_3_mask;
reg [0:0] _d_tx_pin;
reg [0:0] _q_tx_pin;
reg [0:0] _d_out_valid;
reg [0:0] _q_out_valid;
reg [3:0] _d__idx_fsm0,_q__idx_fsm0;
assign out_tx_pin = _q_tx_pin;
assign out_out_valid = _q_out_valid;
assign out_done = (_q__idx_fsm0 == 0)
;
`ifdef FORMAL
initial begin
assume(reset);
end
assume property($initstate || (in_run || out_done));
`endif
always @* begin
_d_t1 = _q_t1;
_d___block_3_mask = _q___block_3_mask;
_d_tx_pin = _q_tx_pin;
_d_out_valid = _q_out_valid;
_d__idx_fsm0 = _q__idx_fsm0;
// _always_pre
(* full_case *)
case (_q__idx_fsm0)
1: begin
// _top
_d_t1 = in_clock_counter+1;
_d_tx_pin = 0;
_d_out_valid = 1;
_d__idx_fsm0 = 2;
end
2: begin
// __while__block_1
if (1&&in_clock_counter-_q_t1<0) begin
// __block_2
// __block_4
_d_out_valid = 0;
// __block_5
_d__idx_fsm0 = 2;
end else begin
_d__idx_fsm0 = 3;
end
end
3: begin
// __block_3
_d_t1 = _q_t1+1;
_d___block_3_mask = 1;
_d__idx_fsm0 = 4;
end
4: begin
// __while__block_6
if (_q___block_3_mask!=0) begin
// __block_7
// __block_9
// __block_10
_d_tx_pin = (in_data&_q___block_3_mask)!=0;
_d_out_valid = 1;
_d__idx_fsm0 = 8;
end else begin
_d__idx_fsm0 = 5;
end
end
8: begin
// __while__block_11
if (1&&in_clock_counter-_q_t1<0) begin
// __block_12
// __block_14
_d_out_valid = 0;
// __block_15
_d__idx_fsm0 = 8;
end else begin
_d__idx_fsm0 = 9;
end
end
5: begin
// __block_8
_d_tx_pin = 1;
_d_out_valid = 1;
_d__idx_fsm0 = 6;
end
9: begin
// __block_13
_d_t1 = _q_t1+1;
// __block_16
_d___block_3_mask = _q___block_3_mask<<1;
// __block_17
_d__idx_fsm0 = 4;
end
6: begin
// __while__block_18
if (1&&in_clock_counter-_q_t1<0) begin
// __block_19
// __block_21
_d_out_valid = 0;
// __block_22
_d__idx_fsm0 = 6;
end else begin
_d__idx_fsm0 = 0;
end
end
7: begin
// __block_20
_d__idx_fsm0 = 0;
end
0: begin
end
default: begin
_d__idx_fsm0 = {4{1'bx}};
`ifdef FORMAL
assume(0);
`endif
end
endcase
// _always_post
// pipeline stage triggers
end
always @(posedge clock) begin
_q_t1 <= _d_t1;
_q___block_3_mask <= _d___block_3_mask;
_q_tx_pin <= _d_tx_pin;
_q_out_valid <= _d_out_valid;
_q__idx_fsm0 <= reset ? 0 : ( ~in_run ? 1 : _d__idx_fsm0);
end
endmodule
Any time that a block jumps to a next state (doing nothing else), the code block of the target state can be incorporated instead of just jumping, saving a clock cycle. Doing that incorporation may leave states that are not referenced anymore in other block, so they can be safely deleted.
In practice, this UART example is taking more cycles than really needed, and applying the above may get it to the optimum amount of cycles (i.e., producing up to a bit per cycle)
Original Silice code, for reference:
algorithm uart_tx(
input uint8 data,
output uint1 tx_pin,
output uint1 out_valid,
input int32 clock_counter
)
{
int32 t1 = uninitialized;
t1 = clock_counter + 1; //constant may change for other baudrate
tx_pin = 0;
out_valid = 1;
while(clock_counter - t1 < 0)
{ out_valid = 0; }
t1 = t1 + 1;
uint8 mask = uninitialized;
mask=1;
while(mask!=0)
{
tx_pin = (data & mask) != 0;
out_valid = 1;
while(clock_counter - t1 < 0)
{ out_valid = 0; }
t1 = t1 + 1;
mask=mask<<1;
}
tx_pin = 1;
out_valid = 1;
while(clock_counter - t1 < 0)
{ out_valid = 0; }
}
TL;DR good point, have an implementation, but this can potentially break existing code.
Details
I encountered this case in various designs and usually rewrote the loops from:
while (C) {
A
}
B
to:
while (1) {
if (C) {
A
} else {
B
break;
}
}
which avoids the additional cycle.
My initial thoughts when implementing the loops were that it would be easier for the designer to predict what happens with cycles by not 'collapsing' B into the loop.
However, I now think this would indeed have been a better approach, for the following reasons.
First, while loops are already chaining each others in cases like this:
while (C1) { A1 }
while (C2) { A2 }
But not in cases like that:
while (C1) { A1 }
B
while (C2) { A2 }
Collapsing B in the loop above elegantly solves this, allowing while loops to always chain when no state is needed in between, making things more consistent and avoiding any extra cycle.
Second, in designing hardware it does turn out to be problematic, as this extra cycle is often indeed undesired (e.g. main problem appears when needing to signal the end of a loop at the exact cycle it happens). This leads to these innelegant while (1)
loops.
Conversely, with the change in place one can simply go back to the previous behavior with:
while (C) { A }
++:
B
I have a prototype already in place (not pushed).
Unfortunately this change breaks some existing code. Cases are rare but might be hard to track down: For instance PAWSv2 needs added ++:
like above in at least a few places, a couple of my designs too.
That makes me very hesitant (I strive to avoid breaking changes). On the plus side, there will be a major draft to master merge soon with already a few (less problematic) syntactic changes. So this may be a good time to do this change...
Let's take a few days to think about it!
I think you have a dilemma between keeping backwards compatibility, and offering a way of producing faster code.
Why not both? You may pass a flag for retro compatibility (like in a C++ compiler, with -std=c++20
)
If the user base is not big (maybe PAWS is the larger third party project that could be affected) I bet the developer would be willing to adapt to the new syntax and add the ++ operator where it's needed, that also offers more readability and make clearer the intent.
The structure while (C) { A }
is quite common in my view, it's useful to wait for some signal to be active or condition to be met like while(C){}
, useful for example in a AXI device to determine if both valid and ready signals are asserted (another example I was trying). That way, the minimal wait time can be zero and not one or more, allowing bursts at the maximum rate.
The UART example from above can also be a good test case, I expect it to be able to reach a new output value per cycle, so the maximum possible frequency is like dividing the clock by one and not two or more as happens in the current state.
I'll try the suggested structure to see if clock cycles are saved
while (1) {
if (C) {
A
} else {
B
break;
}
}
but the it's less readable and clean than
while (C) { A }
B
May the PAWS developer @rob-ng15 give his opinion?
Hi Sylvian/Victor,
I am happy to go through the PAWSv2 code and make any changes that are required if a breaking change is made, I've done it before with guidance from Sylvian, when an improvement to Silice required it. As long as we can work out, as best we can, what changes are required I am happy to do them.
I've been looking through the PAWSv2 code base and can see quite a few of these style loops, mainly in the code that I haven't changed recently, the FPU64.si code had the while loops removed a while ago as I found that they gave a faster design.
I'm all for making the final builds "better", and yes I agree, for the UART, it should be possible to get 1 bit per cycle.
Rob.
On Wed, 15 Mar 2023 at 02:25, Victor Suarez Rovere @.***> wrote:
I think you have a dilemma between keeping backwards compatibility, and offering a way of producing faster code. Why not both? You may pass a flag for retro compatibility (like in a C++ compiler, with -std=c++20 If the user base is not big (maybe PAWS is the larger third project) I bet the developer would be willing to adapt to the new syntax and add the ++ operator that also offers more readability and clears intent.
The structure while (C) { A } is quite common in my view, it's useful to wait for some signal to be active or condition to be met like while(C){}, useful for example in a AXI device to determine if both valid and ready signals are asserted (another example I was trying). That way, the minimal wait time can be zero and not one or more, allowing burst at the maximum rate.
The UART example from above can also be a good test case, I expect it to be able to reach a new output value per cycle, so the maximum possible frequency is like dividing the clock by one and not two or more as happens in the current state.
I'll try the suggested structure to see if clock cycles are saved
while (1) { if (C) { A } else { B break; } }
but the it's less readable and clean than
while (C) { A } B
May the PAWS developer @rob-ng15 https://github.com/rob-ng15 give his opinion?
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1469200899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT23P76UMAKKG3GVTYDW4ESBVANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hello,
in many places I use the following construct:
algorithm
{
cs = 0; // Chip select active
++: // One cycle setup time to first clock pulse
while (data_must_be_clocked_out) {
clockout_data;
}
cs = 1; // Chip select inactive. Needs (at least) one cycle hold time after last clock pulse.
}
If I understand correctly the changes that are being considered, these would need to be reworked. In general, I would also be in favor of making the generated code faster and more compact.
Regards, Bernd.
Thanks for the feedback! Bernd, this is correct, such code would be impacted. Most often the change will have no detrimental effect and a cycle is saved.
Two bad cases that occur in my tests so far:
I started experimenting with a feature that tracks code potentially impacted and reports a detailed "change log warning" after compilation. This would be active between versions, and would mitigate the risk that someone would be left wondering why code is no longer working (we all know how hard debugging these things can be!). Overall, this calls for a rigorous and careful reporting of changes, which I did start to put in place and will prioritize. This, together with continuing to avoid such changes in the first place.
Hi Bernd et al,
I've been looking through PAWSv2, and the GPU.si is probably going to be the most affected, the CPU/ALU/FPU and cache/memory controllers probably not as much, or indeed at all.
The GPU code hasn't been touched for quite some time, apart from very simple changes, and is the oldest part of PAWSv2. I'm just trying out a few bits and pieces to see how it goes, and will feed back.
Rob.
On Wed, 15 Mar 2023 at 10:13, sylefeb @.***> wrote:
Thanks for the feedback! Bernd, this is correct, such code would be impacted. Most often the change will have no detrimental effect and a cycle is saved.
Two bad cases that occur in my tests so far:
- A combinational cycle appears due to the step being removed ; this one is not so bad because an error is issued and I'd add a message to indicate this change can cause it.
- The disappearing cycle causes a change to logic as a silent side effect, the design compiles and synthesizes but is now incorrect. This is the case that is a cause for concern.
I started experimenting with a feature that tracks code potentially impacted and reports a detailed "change log warning" after compilation. This would be active between versions, and would mitigate the risk that someone would be left wondering why code is no longer working (we all know how hard debugging these things can be!). Overall, this calls for a rigorous and careful reporting of changes, which I did start to put in place and will prioritize. This, together with continuing to avoid such changes in the first place.
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1469724845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT7L6NP7LR4FFEU2LADW4GI47ANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
In the GPU blit unit (blitter), there is:
algorithm <autorun> {
while(1) {
if( |start ) {
busy = start; PYS.offset = 0;
while( pyworking ) {
PXS.offset = 0;
while( pxworking ) {
y2 = 0;
while( y2working ) { x2 = 0; while( x2working ) {
dooutput = 1; } y2 = y2NEXT; } PXS.offset = pxNEXT; } PYS.offset = pyNEXT; } busy = 0; } else { busy = 0; } } }
Which I've just replaced with:
algorithm <autorun> {
while(1) {
if( |start ) {
busy = start; PYS.offset = 0;
while( 1 ) {
if( pyworking ) {
PXS.offset = 0;
while( 1 ) {
if( pxworking ) {
y2 = 0;
while( 1 ) {
if( y2working ) {
x2 = 0;
while( 1 ) {
if( x2working ) {
dooutput = 1;
} else {
y2 = y2NEXT;
break;
}
}
} else {
PXS.offset = pxNEXT;
break;
}
}
} else {
PYS.offset = pyNEXT;
break;
}
}
} else {
busy = 0;
break;
}
}
} else {
busy = 0;
}
}
}
This works, thankfully, but is nowhere as compact/readable as the former.
If I am reading Victor's proposal correctly, is for this transformation happening automatically? I'd definitely be for that, as the original code is far easier to understand.
Rob.
On Wed, 15 Mar 2023 at 10:29, Rob Shelton @.***> wrote:
Hi Bernd et al,
I've been looking through PAWSv2, and the GPU.si is probably going to be the most affected, the CPU/ALU/FPU and cache/memory controllers probably not as much, or indeed at all.
The GPU code hasn't been touched for quite some time, apart from very simple changes, and is the oldest part of PAWSv2. I'm just trying out a few bits and pieces to see how it goes, and will feed back.
Rob.
On Wed, 15 Mar 2023 at 10:13, sylefeb @.***> wrote:
Thanks for the feedback! Bernd, this is correct, such code would be impacted. Most often the change will have no detrimental effect and a cycle is saved.
Two bad cases that occur in my tests so far:
- A combinational cycle appears due to the step being removed ; this one is not so bad because an error is issued and I'd add a message to indicate this change can cause it.
- The disappearing cycle causes a change to logic as a silent side effect, the design compiles and synthesizes but is now incorrect. This is the case that is a cause for concern.
I started experimenting with a feature that tracks code potentially impacted and reports a detailed "change log warning" after compilation. This would be active between versions, and would mitigate the risk that someone would be left wondering why code is no longer working (we all know how hard debugging these things can be!). Overall, this calls for a rigorous and careful reporting of changes, which I did start to put in place and will prioritize. This, together with continuing to avoid such changes in the first place.
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1469724845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT7L6NP7LR4FFEU2LADW4GI47ANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi Rob, Thanks for the tests!
Yes, this will be completely automated. I mentioned the rewrite as a current way to avoid the extra cycle. Last night I pushed in draft
a prototype that implements this cycle-skip and also outputs a change-log report in the console. The report points out everywhere the change possibly has an effect. This should help track potential issues.
Please consider it still preliminary and work in progress, I am hoping to do more thorough experiments in the coming evenings.
I am also looking into other interesting similar cases relating to if/else. Working on it but this requires a bit more subtlety.
Thanks everyone! Sylvain
Hi All,
Doing a few more tests, my main issue would appear to be around those loops that wait for memory, and then read a value into something. To find out the issues, I've been manually unwinding the loops, seeing what fails, and then seeing what can be done to sort the issue. I did try the updated Silice and got a blank screen, as expected.
Rob.
On Thu, 16 Mar 2023 at 17:14, sylefeb @.***> wrote:
Hi Rob, Thanks for the tests!
Yes, this will be completely automated. I mentioned the rewrite as a current way to avoid the extra cycle. Last night I pushed in draft a prototype that implements this cycle-skip and also outputs a change-log report in the console. The report points out everywhere the change possibly has an effect. This should help track potential issues.
Please consider it still preliminary and work in progress, I am hoping to do more thorough experiments in the coming evenings.
I am also looking into other interesting similar cases relating to if/else. Working on it but this requires a bit more subtlety.
Thanks everyone! Sylvain
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1472381475, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT7WSKL2NQTQ2UEPK5TW4NC6TANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
I was thinking a bit about this and I think the example construct with and without 'break' (see below) should produce the exact same results, as users would expect, and that they should know that if any kind optimizations are enabled (for example) all constructs could remove unnecessary cycles. So the adding of extra cycles should be always explicit, as it is being reworked for PAWS. Assumed extra cycles are a bit confusing, at least for me.
while (C) { A } B
to:
while (1) { if (C) { A } else { B break; } }
I agree with Rob that the "wait" for some signal would be a very common construct, indeed. And the most clear syntax is while(cond) { ... }
, that shouldn't add any cycle if the condition is not met
Yes, agreed. This is indeed the new behavior with latest changes in draft
:
while (C) {
__display("in loop");
}
__display("after");
Now gives (formatted for clarity):
2: begin
// __while__block_1
if (_c_C) begin
// __block_2
// __block_4
$display("in loop");
// __block_5
_d__idx_fsm0 = 2;
end else begin
// __block_3
$display("after");
_d__idx_fsm0 = 0;
end
end
Note how __display("after");
has been collapsed into the loop conditional.
This is the same as:
while (1) {
if (C) {
__display("in loop");
} else {
__display("after");
break;
}
}
which gives (as before):
2: begin
// __while__block_1
if (1) begin
// __block_2
// __block_4
if (_c_C) begin
// __block_5
// __block_7
$display("in loop");
// __block_8
// __block_12
// __block_13
_d__idx_fsm0 = 2;
end else begin
// __block_6
// __block_9
$display("after");
_d__idx_fsm0 = 0;
end
end else begin
_d__idx_fsm0 = 0;
end
end
Both are now exactly the same result (ignoring the if (1)
that has no effect).
Regarding the waiting loop:
__display("before");
while (C) { __display("waiting"); }
__display("thing done after");
++:
__display("a later state");
Now gives (again formatted for clarity):
1: begin
// _top
$display("before");
_d__idx_fsm0 = 2;
end
2: begin
// __while__block_1
if (_c_C) begin
// __block_2
// __block_4
$display("waiting");
// __block_5
_d__idx_fsm0 = 2;
end else begin
// __block_3
$display("thing done after");
_d__idx_fsm0 = 3;
end
end
3: begin
// __block_6
$display("a later state");
_d__idx_fsm0 = 0;
end
Note how __display("thing done after");
is now in the else part of the loop conditional, it will be reached without any intermediate cycle when C
becomes false. I however do need at least one state for the while loop unless Silice starts to allow duplicating design parts -- which is possible but would be hard to predict for the user (and potentially a hefty hidden burden on fpga resources, keeping in mind the duplications might cascade).
Why the need to duplicate? Silice would have to automatically rewrite as such:
__display("before"); if (C) { while (1) { if (C) { __display("waiting"); } else { __display("thing done after"); /*copy1*/} } } else { __display("thing done after"); /*copy2*/ }
This is possible to do explicitly 'by hand' if the code duplication is acceptable (there is btw a related case in the documentation). Everything up the the next state has to be duplicated: there should be nothing left between after the if/else and the next state.
I am exploring other such improvements.
@rob-ng15 was the change log output useful? it should accurately tell which loops are being changed (and those not mentioned should behave the same as before).
In my view, the code duplication should be possible, maybe depending on optimization level. There are situations where the code block will just set a flag, and the synth tool will readily optimize it using combinatorial optimizations that should take into account the FSM state (as part of the logic inputs) with probably no noticeable effect on resource usage. Indeed, cascading would be useful in certain situations, and I think the UART example will require that for max throughput. To me, after the "before" statement, the condution evaluation should be inlined in, with a separete FSM state to handle execution of the while
body. In the example on previous message, the code can't be executed single-cycle when the condition is not met (as required for max speed), since there's a inserted cycle after the "before" example regardless on the condition
I mean something like that:
1: begin
// _top
$display("before");
_d__idx_fsm0 = 2;
// __while__block_1
if (_c_C) begin
_d__idx_fsm0 = 2; //jump to next state only if condition is met
end else begin
// __block_3
$display("thing done after");
_d__idx_fsm0 = 3;
end
end
2: begin
// __while__block_1
if (_c_C) begin
// __block_2
// __block_4
$display("waiting");
// __block_5
_d__idx_fsm0 = 2;
end else begin
// __block_3
$display("thing done after");
_d__idx_fsm0 = 3;
end
end
3: begin
// __block_6
$display("a later state");
_d__idx_fsm0 = 0;
end
Note how the body of the while statement is not copied on the 1st state
If code duplication is allowed, then indeed this would be the output. This is the actual output of the by-hand version I made above:
1: begin
// _top
$display("before");
if (~_c_C) begin
// __block_1
// __block_3
$display("thing done after");
// __block_4
_d__idx_fsm0 = 0;
end else begin
// __block_2
// __block_5
_d__idx_fsm0 = 3;
end
end
3: begin
// __while__block_6
if (_c_C) begin
// __block_7
// __block_9
$display("waiting");
// __block_10
_d__idx_fsm0 = 3;
end else begin
// __block_8
$display("thing done after");
// __block_11
_d__idx_fsm0 = 0;
end
end
On the one hand it is tempting to enter these optimizations (I actually just had to implement a mechanism to prevent natural code dup on some if/else optims :grinning:). On the other hand this starts to endanger a principle I set out initially: finding a balance between 1) leaving the user completely in charge (aka. being low-level) while 2) providing some syntactic conveniences which effect is predictable. There is of course a gray area -- making this fun to explore! -- but optimizations such as code dup. could quickly snowball into a full blown HLS tool, and this would change the expectations. (Also, there is the question of the optimizer being able to make good decisions based on the target hardware ... not simple).
Stay tuned, more cycles are being saved!
@sylefeb Yes I did have a look at the change log messages, they directed me to pretty much where I'd expect the affected while loops would be. There was one message pointing towards a unit with no while loops.
I'm happy for these changes to go ahead, and I'll adjust PAWSv2, probably just the RV64 to accommodate.
Rob
It's is clear for anyone that too much optimization could be dangerous, but to implement for example the UART core, I'd be willing to set optimization level to the sky in order to get one output bit per cycle, instead of having to cut max baurdate in half
As a tradeoff, would writing it with code duplication by hand be acceptable? It should then avoid unnecessary cycles, and you stay in full control of code duplication?
This is a matter of personal taste, but I'd prefer a dangerous optimization flag, than code that's hard to follow and understand I really think the 'break' version and the 'no break' version should produce the exact same results Don't you want different optimizations levels? I can live with the verbose structure since I'd rather generate the code, but what I suggest is just an idea to improve your tool. Based on what we exchanged, I'd change the current code generator, anyways, so I can have it working earlier ;-)
Break and no break do now produce the exact same result (see above). I was only talking about avoiding the one extra cycle at the start of a wait loop, e.g.:
__display("before");
while (C) { __display("waiting"); }
__display("thing done after");
If C
is false all along this currently (latest draft branch) will require one cycle. This one cycle cannot be avoided (afaict) without code duplication.
Suggestions are most welcome, keep them coming! ๐
Some good news this morning. Here is another case that saves cycles (many in some cases):
Some if/else constructs are such that when taking one branch what comes after the if/else is not reached.
For instance in a loop body:
if (C) {
break;
}
N
If C
is true, N
is never reached. Before this was introducing a cycle before N
, which can actually
be skipped by pulling N
in the else part:
if (C) {
break;
} else {
N
}
This is now done automatically. These optimizations cascade to successive if-s, for instance:
if (C1) { break; }
if (C2) { break; }
if (C3) { break; }
N
Takes now only one cycle as the if-s and N
are automatically nested (versus 3 cycles before).
Interestingly the optimizer would happily duplicate code to achieve more aggressive nesting (this is the case I was mentioning), so I explicitly disallow that. But of course this could be re-enabled ... food for thoughts.
In other news, I am also seeing benefits of these optimization in designs, where the LUT count is reduced thanks to fewer FSM states. Of course fmax could suffer, but in these cases it seemed only beneficial.
I am getting ready to push to draft
, so far all tests pass and look good.
This is great news and to me the best that can be done is to add options for optimization levels. If they can be passed by command line, it would be good. If they can be overridden per-function, it would be better.
I'm definitely keeping that in mind ; in fact I was thinking it could be specified per construct, and optionally overridden. For the case of the 'skip' loops -- which I really would like to address as they often appear indeed -- I thought there could be a if_while
which performs the re-write above automatically: if_while (C) { A } B
=> if (C) { while (1) { if (C) { A } else { B break; } } } else { B }
(B being the duplicated part).
This would make it clear the designer intends to pay the price of code duplication.
Then, when instantiating a unit, one could ask to aggressively change all while
for if_while
as an instantiation parameter (but I can see deep cascading cases that will basically generate all possible combinations of code paths, e.g. when B itself contains other such loops).
I'll experiment! (edited for typos)
I like the idea of a language extension and that kind of overriding. Happy of being part of this design changes!
Hi All,
Progress report for PAWSv2 RV64. I have it building and running with the latest draft of Silice, as of at commit:
[image: @sylefeb] https://github.com/sylefeb sylefeb https://github.com/sylefeb pushed to sylefeb/Silice https://github.com/sylefeb/Silice ยท 3 days ago 1 commit to draft https://github.com/sylefeb/Silice/tree/draft
[image: @sylefeb] https://github.com/sylefeb5d1b92e https://github.com/sylefeb/Silice/commit/5d1b92e87e89c8674884421ca34c0f910dd93c55
cleaning error test cases, fixing assert on write to read-only bound โฆ
This has been achieved as a mix of rewriting loops and/or a sprinkling of ++: around after while loops. Main aim was to get it to build, and I'll work out how best to optimise later. It is slightly slower, D-STONE gives 161 vs 162, and W-STONE gives 13.6 vs 16.7 MIPS, but everything is running as expected.
For now, I've got the hardship of driving from Whitebridge to Inverness, via some glorious scenery and Loch Ness, as I need to do some shopping. The problems with having retired early and moving to The Scottish Highlands and living quietly in the mountains!
Have a great day all,
Rob.
On Sat, 18 Mar 2023 at 13:03, Victor Suarez Rovere @.***> wrote:
I like the idea of a language extension and that kind of overriding. Happy of being part of this design changes!
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1474846252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT5UUAUKAY2B2HMABRLW4WXCFANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi All,
Survived!
On a serious note, as this is a GitHub issue, any suggestions on where ++: may or may not be needed? PAWSv2, especially the RV64 version takes a while to build, and whilst I am happy to use trial and error, any helpful hints are greatly appreciated. Plus there is a lot of code to go through.
I am happy for these changes to be pushed through though, if they make Silice better, then I am all for it.
Rob.
On Sun, 19 Mar 2023, 11:19 Rob Shelton, @.***> wrote:
Hi All,
Progress report for PAWSv2 RV64. I have it building and running with the latest draft of Silice, as of at commit:
[image: @sylefeb] https://github.com/sylefeb sylefeb https://github.com/sylefeb pushed to sylefeb/Silice https://github.com/sylefeb/Silice ยท 3 days ago 1 commit to draft https://github.com/sylefeb/Silice/tree/draft
[image: @sylefeb] https://github.com/sylefeb5d1b92e https://github.com/sylefeb/Silice/commit/5d1b92e87e89c8674884421ca34c0f910dd93c55
cleaning error test cases, fixing assert on write to read-only bound โฆ
This has been achieved as a mix of rewriting loops and/or a sprinkling of ++: around after while loops. Main aim was to get it to build, and I'll work out how best to optimise later. It is slightly slower, D-STONE gives 161 vs 162, and W-STONE gives 13.6 vs 16.7 MIPS, but everything is running as expected.
For now, I've got the hardship of driving from Whitebridge to Inverness, via some glorious scenery and Loch Ness, as I need to do some shopping. The problems with having retired early and moving to The Scottish Highlands and living quietly in the mountains!
Have a great day all,
Rob.
On Sat, 18 Mar 2023 at 13:03, Victor Suarez Rovere < @.***> wrote:
I like the idea of a language extension and that kind of overriding. Happy of being part of this design changes!
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1474846252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT5UUAUKAY2B2HMABRLW4WXCFANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi Rob, thanks a lot for your efforts in porting PAWSv2 to latest!! A drive through beautiful landscapes sounds like perfect to come back with a clear mind and focus!
There are two important things coming up:
I am planning to test these (and hopefully push) tonight. If PAWSv2 is pushed I am happy to give it a try too!
Regarding ++:
the easiest by far would be to test in simulation (e.g. verilator) if possible. A generic guideline would be that if the loops are not carefully synchronized with other parts of the design (like waiting on a busy, or BRAM cycles) then the change should not have detrimental side-effects.
Hi,
Yes it was nice, always is! Map attached. PAWSv2 devel branch, RV64 is up to date, pushed as of this morning.
Let me know if you get chance, I'll see what I can do tomorrow.
Rob.
On Sun, 19 Mar 2023, 18:30 sylefeb, @.***> wrote:
Hi Rob, thanks a lot for your efforts in porting PAWSv2 to latest!! A drive through beautiful landscapes sounds like perfect to come back with a clear mind and focus!
There are two important things coming up:
- the first introduces further cycles savings but also potential trouble (the new optimization for if/else) even though I expect this one to be much less problematic than the while loops
- the second should help a lot and is a tool to visualize how the original source code is decomposed into states/cycles.
I am planning to test these (and hopefully push) tonight. If PAWSv2 is pushed I am happy to give it a try too!
Regarding ++: the easiest by far would be to test in simulation (e.g. verilator) if possible. A generic guideline would be that if the loops are not carefully synchronized with other parts of the design (like waiting on a busy, or BRAM cycles) then the change should not have detrimental side-effects.
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1475354920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT6U4NC63UOM67AWZALW45GFDANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi All,
A bit more progress, managed to streamline a few bits and pieces.
A simple rule so far seems to be if you are waiting for a flag and then reading a value a ++: will be required, but waiting for a flag and setting a value doesn't.
D-STONE now up to 170, an improvement, and W-STONE has "improved" to 14.3, up from 13.7, but still lower than the original 16.7.
Rob.
On Sun, 19 Mar 2023, 18:51 Rob Shelton, @.***> wrote:
Hi,
Yes it was nice, always is! Map attached. PAWSv2 devel branch, RV64 is up to date, pushed as of this morning.
Let me know if you get chance, I'll see what I can do tomorrow.
Rob.
On Sun, 19 Mar 2023, 18:30 sylefeb, @.***> wrote:
Hi Rob, thanks a lot for your efforts in porting PAWSv2 to latest!! A drive through beautiful landscapes sounds like perfect to come back with a clear mind and focus!
There are two important things coming up:
- the first introduces further cycles savings but also potential trouble (the new optimization for if/else) even though I expect this one to be much less problematic than the while loops
- the second should help a lot and is a tool to visualize how the original source code is decomposed into states/cycles.
I am planning to test these (and hopefully push) tonight. If PAWSv2 is pushed I am happy to give it a try too!
Regarding ++: the easiest by far would be to test in simulation (e.g. verilator) if possible. A generic guideline would be that if the loops are not carefully synchronized with other parts of the design (like waiting on a busy, or BRAM cycles) then the change should not have detrimental side-effects.
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1475354920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT6U4NC63UOM67AWZALW45GFDANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi Rob, thanks for the update. I have been running late on my side (got side-tracked in optimizing the parser, good news on that front as well but different issue :) ). Hopefully it should be possible to get the same result, adding the ++:
right after the loop should recover the previous behavior (if not then there is something wrong in my implementation). [edit: typo]
I pushed additional changes to draft
, most notably the if/else cycle optimizations I mentioned above. The change log report will list everywhere this has an impact (as for while loops).
@rob-ng15 I synthesized PAWSv2 (devel) and the welcome screen shows up, however I seem to still have this issue that my SDcard is not recognized. Hopefully this is just due to my card and not due to these latest changes...
Hi @sylefeb
Many thanks. The SD CARD I am using is read ok, as of this morning.
I'll try the updated changes tomorrow and let you know.
Rob.
Hi,
Just built, and am up and running. It didn't report any extra issues in the changelog, so I am guessing that none of the if/else changes have affected PAWSv2.
When you get to the BIOS screen, does it hang waiting for the SDCARD, or does it flash an error regarding not being a valid FAT32 card?
Rob.
On Wed, 22 Mar 2023 at 19:37, sylefeb @.***> wrote:
I pushed additional changes to draft, most notably the if/else cycle optimizations I mentioned above. The change log report will list everywhere this has an impact (as for while loops).
@rob-ng15 https://github.com/rob-ng15 I synthesized PAWSv2 (devel) and the welcome screen shows up, however I seem to still have this issue that my SDcard is not recognized. Hopefully this is just due to my card and not due to these latest changes...
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1480149382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT7LVEBVCA5LW76CYT3W5NIHZANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi Rob, it said "ERROR Insert SDCARD with FAT32 partition" (the screen was animated, the LEDs as well). However, great news: I tried another card and it works! (had to press reset a couple times, that's all). ๐
Hi,
That could be a fault with my FAT32 file system reader in the BIOS I guess. Although I've had issues with some SDCARDS with other things on the ULX3S.
Rob.
On Thu, 23 Mar 2023 at 21:57, sylefeb @.***> wrote:
Hi Rob, it said "ERROR Insert SDCARD with FAT32 partition" (the screen was animated, the LEDs as well). However, great news: I tried another card and it works! (had to press reset a couple times, that's all). ๐
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1481966441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT2DRWSKKGCZSFM7RG3W5TBLJANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Hi All,
Still no issues using the DRAFT branch and PAWSv2 (only RV64 tested at present). Have been enabling half-precision support in the FPU. Only tested in VERILATOR as I don't presently have a GCC that compile with half-precision support, and certainly don't have support libraries with it either.
Rob.
On Fri, 24 Mar 2023 at 09:17, Rob Shelton @.***> wrote:
Hi,
That could be a fault with my FAT32 file system reader in the BIOS I guess. Although I've had issues with some SDCARDS with other things on the ULX3S.
Rob.
On Thu, 23 Mar 2023 at 21:57, sylefeb @.***> wrote:
Hi Rob, it said "ERROR Insert SDCARD with FAT32 partition" (the screen was animated, the LEDs as well). However, great news: I tried another card and it works! (had to press reset a couple times, that's all). ๐
โ Reply to this email directly, view it on GitHub https://github.com/sylefeb/Silice/issues/248#issuecomment-1481966441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN4SYT2DRWSKKGCZSFM7RG3W5TBLJANCNFSM6AAAAAAVXWI6OU . You are receiving this because you were mentioned.Message ID: @.***>
Ideally there would be no such states, in particular this needlessly increases the number of states in the FSM and the index width.
Here is an example:
Verilog (relevant part):
State 3 is left over (added but unused). This might only affect the case solved after #247 (jump to termination state), to be confirmed.