Closed GoogleCodeExporter closed 8 years ago
After looking into it a bit more it looks like you always have to use the
cli(); since you use a variable to set/clear bits. The compiler can not
optimize this to single sbi/cbi instructions even for lower registers in
sbi/cbi range.
Original comment by m...@stohn.de
on 30 May 2013 at 10:36
[deleted comment]
[deleted comment]
Thanks for reporting this.
I assume, that i need to update u8g_com_io.c:
Current implementation:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
if ( level == 0 )
*tmp &= ~_BV(internal_pin_number&7);
else
*tmp |= _BV(internal_pin_number&7);
}
This should be:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
uint8_t oldSREG;
cli();
if ( level == 0 )
*tmp &= ~_BV(internal_pin_number&7);
else
*tmp |= _BV(internal_pin_number&7);
SREG = oldSREG;
}
correct?
Original comment by olikr...@gmail.com
on 31 May 2013 at 5:19
Oh oh...
this seems to get a bigger problem than I thought.
I was getting the issues inside of "u8g_com_arduino_st7920_spi.c" :
...
static void u8g_com_arduino_do_shift_out_msb_first(uint8_t val)
{
...
do
{
if ( val & 128 )
*outData |= bitData;
else
*outData &= bitNotData;
...
But the location you mention is also affected (e.g. when setting the chip
select line).
The problem itself is not only related to AVR.
Whenever you set/reset a pin with an operation like:
PORTx |= b; or PORTx &= m;
the compile might expand this to several instructions like:
reg = PORTx;
reg |= b;
PORTx = reg;
If this is the case then an interrupt after the reg= operation might change
port bits and the write back later with PORTx = reg; corrupts the output.
On most platforms a good compiler like gcc will optimize this to a single bit
set / bit clear operation. Unfortunately AVR megas are having more IO than the
instruction set was designed for so it can not use the bit set/clear
instructions on several (high) PORTs. Also the instruction set does not support
setting/clearing multiple bits at the same time or variable bit location. So
the compiler must know the bit position at compile time (which is not possible
with your variable implementation).
==> your proposed SetPinLevel for AVR is a good.
(small mistake in your code): uint8_t oldSREG; should read: uint8_t oldSREG = SREG;
Unfortunately this will add a lot of latency / interrupt blocking to the
output. My first idea would be to add a #define to support "interrupt save"
operation. All the people using your great library would suffer slower I/O by
the changes regardless if they use interrupts or not.
Also a good hardware design can prevent the need for using the cli(); stuff.
One could use a dedicated port for all display lines and rest of port for some
inputs only.
---
My main goal was to speed up the software SPI in order to use a 7920 display
(HW-SPI is not available on the hardware I want to use it).
But now I see a slow down is required (cli() stuff) before a speedup can be
thought of.
I will experiment a bit with the SW-SPI implementation to find a good trade-off
for speedup / interrupt blocking.
Just a question at this time: Why do you set the shift out function
"U8G_NOINLINE". In my eyes this would be a perfect candidate for inlining.
Speed is much more important than saving code size for this function.
Original comment by m...@stohn.de
on 31 May 2013 at 7:49
ah, correct:
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
uint8_t oldSREG = SREG;
cli();
if ( level == 0 )
*tmp &= ~_BV(internal_pin_number&7);
else
*tmp |= _BV(internal_pin_number&7);
SREG = oldSREG;
}
regarding this:
=== Quote Start ===
I was getting the issues inside of "u8g_com_arduino_st7920_spi.c" :
...
static void u8g_com_arduino_do_shift_out_msb_first(uint8_t val)
{
...
do
{
if ( val & 128 )
*outData |= bitData;
else
*outData &= bitNotData;
...
=== Quote End ===
I agree that there are more locations to prevent interrupts for some time.
My suggestion would be to introduce two macros
U8G_SAVE_IRQ_STATE_AND_DISABLE_IRQ()
U8G_RESTORE_IRQ_STATE()
These macros can be empty if interrupt safe code is not required.
Otherwise, these macros can be used whereever code requires such protection.
I agree that this is a larger issue. Funny, that i never realized this problem.
Original comment by olikr...@gmail.com
on 31 May 2013 at 8:43
Note: ATOMIC BLOCKS are probably not suitable, mainly because of not beeing
portable enough (AVR specific):
http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html
Original comment by olikr...@gmail.com
on 31 May 2013 at 8:49
Note: stdatomic.h is not yet available on Arduino.
_ATOMIC_MODIFY_(__a, __o, __m, __x) can not be used.
From that point of view the following macros might be more portable for the
future:
U8G_ATOMIC_OR(ptr,val)
and
U8G_ATOMIC_AND(ptr,val)
U8G_ATOMIC_OR(ptr,val)
could be as simple as
U8G_ATOMIC_OR(ptr,val) (*(ptr) |= (val))
but should be
U8G_ATOMIC_OR(ptr,val) (global_SREG = SREG, cli(), (*(ptr) |= (val)), SREG =
global_SREG)
Drawback: access time to global_SREG, not sure if cli() can be called within
the comma expression
U8G_ATOMIC_OR(ptr,val) do { uint8_t tmpSREG = SREG; cli(); (*(ptr) |= (val));
SREG = tmpSREG } while(0)
this need to be checked for optimization...
Original comment by olikr...@gmail.com
on 31 May 2013 at 9:04
current assembler sequence:
94a: 90 81 ld r25, Z
if ( level == 0 )
*tmp &= ~_BV(internal_pin_number&7);
94c: 21 e0 ldi r18, 0x01 ; 1
94e: 30 e0 ldi r19, 0x00 ; 0
950: 0c 2e mov r0, r28
952: 01 c0 rjmp .+2 ; 0x956 <u8g_SetPinLevel+0x26>
954: 22 0f add r18, r18
956: 0a 94 dec r0
958: ea f7 brpl .-6 ; 0x954 <u8g_SetPinLevel+0x24>
95a: d1 11 cpse r29, r1
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
if ( level == 0 )
95c: 03 c0 rjmp .+6 ; 0x964 <u8g_SetPinLevel+0x34>
95e: 20 95 com r18
*tmp &= ~_BV(internal_pin_number&7);
960: 92 23 and r25, r18
962: 01 c0 rjmp .+2 ; 0x966 <u8g_SetPinLevel+0x36>
964: 92 2b or r25, r18
else
*tmp |= _BV(internal_pin_number&7);
966: 90 83 st Z, r25
Original comment by olikr...@gmail.com
on 31 May 2013 at 9:34
sequence with
#define U8G_ATOMIC_OR(ptr, val) do { uint8_t tmpSREG = SREG; cli(); (*(ptr) |=
(val)); SREG = tmpSREG; } while(0)
#define U8G_ATOMIC_AND(ptr, val) do { uint8_t tmpSREG = SREG; cli(); (*(ptr) &=
(val)); SREG = tmpSREG; } while(0)
only minimal overhead is added and a register is used or tmpSREG:
94a: 8f b7 in r24, 0x3f ; 63
if ( level == 0 )
{
U8G_ATOMIC_AND(tmp, ~_BV(internal_pin_number&7));
94c: f8 94 cli
94e: 90 81 ld r25, Z
950: 21 e0 ldi r18, 0x01 ; 1
952: 30 e0 ldi r19, 0x00 ; 0
954: 0c 2e mov r0, r28
956: 01 c0 rjmp .+2 ; 0x95a <u8g_SetPinLevel+0x2a>
958: 22 0f add r18, r18
95a: 0a 94 dec r0
95c: ea f7 brpl .-6 ; 0x958 <u8g_SetPinLevel+0x28>
95e: d1 11 cpse r29, r1
void u8g_SetPinLevel(uint8_t internal_pin_number, uint8_t level)
{
volatile uint8_t * tmp = u8g_get_avr_io_ptr(u8g_avr_port_P, internal_pin_number>>3);
if ( level == 0 )
960: 03 c0 rjmp .+6 ; 0x968 <u8g_SetPinLevel+0x38>
962: 20 95 com r18
{
U8G_ATOMIC_AND(tmp, ~_BV(internal_pin_number&7));
964: 92 23 and r25, r18
966: 01 c0 rjmp .+2 ; 0x96a <u8g_SetPinLevel+0x3a>
968: 92 2b or r25, r18
// *tmp &= ~_BV(internal_pin_number&7);
}
else
{
U8G_ATOMIC_OR(tmp, _BV(internal_pin_number&7));
96a: 90 83 st Z, r25
96c: 8f bf out 0x3f, r24 ; 63
Original comment by olikr...@gmail.com
on 31 May 2013 at 9:39
added the following macros to u8g.h
# define U8G_ATOMIC_START() do { global_SREG_backup = SREG; cli(); }
while(0)
# define U8G_ATOMIC_END() SREG = global_SREG_backup
# define U8G_ATOMIC_OR(ptr, val) do { uint8_t tmpSREG = SREG; cli();
(*(ptr) |= (val)); SREG = tmpSREG; } while(0)
# define U8G_ATOMIC_AND(ptr, val) do { uint8_t tmpSREG = SREG; cli();
(*(ptr) &= (val)); SREG = tmpSREG; } while(0)
U8G_ATOMIC_START / END --> global memory access, bigger code
U8G_ATOMIC_OR / END --> register variable, smaller code
u8g_com_io.c has been updated
other locations need to be identified and updated
Original comment by olikr...@gmail.com
on 31 May 2013 at 9:56
In order to come up with a good optimization for SW-SPI it would be nice to
have something like the proposed
U8G_SAVE_IRQ_STATE_AND_DISABLE_IRQ()
U8G_RESTORE_IRQ_STATE()
since reading/writing the SREG for every single bit looks like overkill.
I think operation like: block IRQ, do a burst of 8 bit, re-enable IRQ could be
the way to go.
Here is my first idea which should optimize to very good on AVR (reading of
PORT is minimized, output values will be inside of registers, loop can be
unrolled,
IRQ_OFF();
if( PORTdat == PORTclk )
{
tmp = PORTdat; //same as PORTclk;
tmp = tmp & CLK_MASK & DAT_MASK; //clear CLK and DAT in tmp
for( i=0; i<7; i++ )
{
tmpout = tmp | (databyte & 0x80) ? DAT_BIT : 0;
PORTdat = tmpout; //set dat and clear clk at same time (this optimization works only in case the value is taken on rising edge like 7920)
databyte <<= 1;
PORTclk = tmpout | CLK_BIT; //set clk
}
}
else
{
tmpdaton = PORTdat | DAT_BIT ;
tmpdatoff = PORTdat & DAT_MASK ;
tmpclkon = PORTclk | CLK_BIT;
tmpclkoff = PORTclk & CLK_MASK;
for( i=0; i<7; i++ )
{
PORTdat = (databyte & 0x80) ? tmpdaton : tmpdatoff;
PORTclk = tmpclkoff; //clear clk
databyte <<= 1;
PORTclk = tmpclkon; //set clk
}
}
IRQ_ON();
What do you think about this ?
Original comment by m...@stohn.de
on 31 May 2013 at 9:57
> What do you think about this ?
I guess, my update on this issue was done in parallel
Yes, indeed, this is why i think u8glib requires two pairs of macros.
U8G_ATOMIC_START / END --> for output of a complete byte, etc..
U8G_ATOMIC_OR / AND --> single, isolated bit change
Original comment by olikr...@gmail.com
on 31 May 2013 at 10:01
>I guess, my update on this issue was done in parallel
Yes... you are just to fast ;-)
Original comment by m...@stohn.de
on 31 May 2013 at 10:10
all files have been updated so far...
U8G_ATOMIC_OR(ptr, val)
U8G_ATOMIC_AND(ptr, val)
U8G_ATOMIC_START();
U8G_ATOMIC_END();
Original comment by olikr...@gmail.com
on 1 Jun 2013 at 1:16
Thanks for this fast update. I think issue #19 was solved by this update and
can be closed now.
Also this issue looks solved now.
Thanks.
Original comment by m...@stohn.de
on 2 Jun 2013 at 11:28
tests so far are ok
Original comment by olikr...@gmail.com
on 4 Jun 2013 at 7:58
Original comment by olikr...@gmail.com
on 4 Jun 2013 at 7:59
Original issue reported on code.google.com by
m...@stohn.de
on 30 May 2013 at 9:37