Closed beefdeadbeef closed 4 months ago
Hi, @beefdeadbeef!
First, BGRT_ST_EEMPTY, often. What is the practical meaning ?
The meaning is we are trying to wake a process from a ditry sync
object with empty queue .
Dirty means some process is going to sleep but it did not yet.
So that's a bug in the current mutex implementation. I'm fixing it now.
Rare and scary one, BGRT_ST_SCALL -- same questions.
Something went wrong!
Could you post your test code here?
For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that.
https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
OK! I've studied the sync
code. The code was written long time ago so I haven't read it for several years.
The following quote of my previous post is incorrect:
Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls bgrt_mtx_free
the mutex is owned by the calling process so:
BGRT_ST_EEMPTY
is returned when there are no processes waiting to lock the mutex;BGRT_ST_OK
is returned when there is at least one waiting process which eventually locks the mutex.So now I think that returning BGRT_ST_EEMPTY
is normal...
For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that.
https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation?
I don't have the board which can run ecmit
so I'll be glad if you start a gdb session on your target board with the following breakpoints set:
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
OK! I've studied the
sync
code. The code was written long time ago so I haven't read it for several years. The following quote of my previous post is incorrect:Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls
bgrt_mtx_free
the mutex is owned by the calling process so:* `BGRT_ST_EEMPTY` is returned when there are no processes waiting to lock the mutex; * and `BGRT_ST_OK` is returned when there is at least one waiting process which eventually locks the mutex.
So it seems my guess was close enough. Then again, why on earth it is nessesary to distinguish between these two ? How calling code can benefit from this ? What will be lost if bgrt_mtx_free() just returns BGRT_ST_OK in both cases, behaving like any other mutex out there ? Not that it is a great deal to check for both OK and EEMPTY in assert, but still.
So now I think that returning
BGRT_ST_EEMPTY
is normal...For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that. https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation? I don't have the board which can run
ecmit
so I'll be glad if you start a gdb session on your target board with the following breakpoints set:* [one](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/sync.c#L839) , * [two](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/syscall.c#L101) , * [three](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/syscall.c#L171) .
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
Sure, I could try with first one, as it happened quite often. As for 2/3, do you still have f4-disco around ? It seems it will be more productive if i just port this thing on f407, which i have too (not disco, but close)
here you go: https://github.com/beefdeadbeef/ecmit/tree/wip/disco
make DEBUG=1 RTT=1, if you want verbose asserts (your probe must support RTT ofc), otherwise just make
Breakpoint 12, _do_int_scall (kblock=
(gdb) p *uspd $1 = { scarg = 0x20002797 <memp_memory_POOL_1024_base+1999>, scnum = 21 '\025', scret = 0 '\000' } (gdb) x/4wx uspd->scarg 0x20002797 <memp_memory_POOL_1024_base+1999>: 0x007d7400 0x00006108 0x0001b000 0x0010fc20
here you go: https://github.com/beefdeadbeef/ecmit/tree/wip/disco
make DEBUG=1 RTT=1, if you want verbose asserts (your probe must support RTT ofc), otherwise just make
Can't build:
libopencm3/mk/genlink-config.mk:65: libopencm3/lib/libopencm3_at32f40x.a library variant for the selected device does not exist.
GENHDR at32/f40x
BUILD lib/at32/f40x
CC adc.c
CC adc_common_v1.c
CC can.c
CC crs.c
CC dac_common_all.c
CC dac_common_v1.c
CC desig_common_all.c
CC desig_common_v1.c
CC dma_common_at32.c
CC dma_common_l1f013.c
CC exti_common_all.c
CC gpio.c
CC gpio_common_all.c
CC i2c_common_v1.c
CC iwdg_common_all.c
CC pwr_common_v1.c
CC rcc.c
CC rcc_common_at32.c
CC rcc_common_all.c
CC rtc.c
CC spi_common_all.c
CC spi_common_v1.c
CC timer_common_at32.c
CC timer_common_all.c
CC usart_common_all.c
CC usart_common_f124.c
CC usb.c
CC usb_control.c
CC usb_standard.c
CC usb_msc.c
CC usb_hid.c
CC usb_audio.c
CC usb_cdc.c
CC usb_midi.c
CC st_usbfs_core.c
CC st_usbfs_v1.c
CC vector.c
CC systick.c
CC scb.c
CC nvic.c
CC assert.c
CC sync.c
CC dwt.c
AR libopencm3_at32f40x.a
make: *** No rule to make target «fw.elf», needed by «all». Stop.
Breakpoint 12, _do_int_scall (kblock=) at bgrt/kernel/kernel.c:93 93 uspd->scret = scret; => 0x0800ad38 <main.isra.0+1048>: 2308 movs r3, #8 0x0800ad3a <main.isra.0+1050>: f885 30c5 strb.w r3, [r5, #197] @ 0xc5
(gdb) p *uspd $1 = { scarg = 0x20002797 <memp_memory_POOL_1024_base+1999>, scnum = 21 '\025', scret = 0 '\000' } (gdb) x/4wx uspd->scarg 0x20002797 <memp_memory_POOL_1024_base+1999>: 0x007d7400 0x00006108 0x0001b000 0x0010fc20
In case of default syscall_handler
table 20 is the last valid number.
Looks like memory corruption in memp_memory_POOL_1024_base
.
That's why I don't like using dynamic memory: there is no reliable method to clarify what is allocated near the corrupted object and who has access to it...
libopencm3/mk/genlink-config.mk:65: libopencm3/lib/libopencm3_at32f40x.a library variant for the selected device does > not exist. GENHDR at32/f40x it seems you're building master branch $ git clone --recursive https://github.com/beefdeadbeef/ecmit $ git checkout -b wip/disco origin/wip/disco $ make DEBUG=1 RTT=1 (first time twice, for notorious libopencm3 idiosyncrasy)
fw.lst.txt Breakpoint 12, _do_int_scall (kblock=) at bgrt/kernel/kernel.c:93 93 uspd->scret = scret; => 0x0800ad38 <main.isra.0+1048>: 2308 movs r3, #8 0x0800ad3a <main.isra.0+1050>: f885 30c5 strb.w r3, [r5, #197] @ 0xc5 (gdb) p *uspd $1 = { scarg = 0x20002797 <memp_memory_POOL_1024_base+1999>, scnum = 21 '\025', scret = 0 '\000' } (gdb) x/4wx uspd->scarg 0x20002797 <memp_memory_POOL_1024_base+1999>: 0x007d7400 0x00006108 0x0001b000 0x0010fc20
In case of default
syscall_handler
table 20 is the last valid number.Looks like memory corruption in
memp_memory_POOL_1024_base
.That's why I don't like using dynamic memory: there is no reliable method to clarify what is allocated near the corrupted object and who has access to it...
Well, not much dynamic, just few pools of fixed size chunks -- but yes, that's possible. Still, memp_memory_POOL_1024 contains two 1k-sized chunks, upper one used for tcpip_thread process stack, and i see nothing incriminating so far: memp_memory_POOL_1024_base+1999 offset is within that range. As for scarg contents -- i'm not sure
Well, not much dynamic, just few pools of fixed size chunks -- but yes, that's possible. Still, memp_memory_POOL_1024 contains two 1k-sized chunks, upper one used for tcpip_thread process stack, and i see nothing incriminating so far: memp_memory_POOL_1024_base+1999 offset is within that range. As for scarg contents -- i'm not sure
Yes, but the interesting thing is memory allocation for the calling process: https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L288
The uspd
structure seats at the end of the struct bgrt_priv_proc_t
. So if there is some stack like object next to the thread.proc
then something can smash scret
and then scnum
...
I do not think this is the case: see lines 23-25 of same file. sync, queue and proc objects are allocated in own private pools.
(gdb) p memp_bgrt_proc.base
$10 = (u8_t ) 0x2000767c
Yes, but the interesting thing is memory allocation for the calling process: https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L288
The
uspd
structure seats at the end of thestruct bgrt_priv_proc_t
. So if there is some stack like object next to thethread.proc
then something can smashscret
and thenscnum
...
Funny though, i've had in one of previous iterations just like that: combined 'struct bgrt_priv_proc_t + stack', and had smashed bottom of said struct with insufficient stack space. leftover of that apparently bad design is still on line 27 :]
I do not think this is the case: see lines 23-25 of same file. sync, queue and proc objects are allocated in own private pools. (gdb) p memp_bgrt_proc.base $10 = (u8t ) 0x2000767c
"" (gdb) p &memp_memory_POOL_1024_base $12 = (u8t ()[2059]) 0x20001fc8
Well... Then the only possible issue I can think of might be a system call leakage...
I've added one more status value to trace this
So it seems my guess was close enough. Then again, why on earth it is nessesary to distinguish between these two ? How calling code can benefit from this ?
And I also have addressed this issue.
Could you test the master branch?
straight to bull's eye. not sure whose, though.
[00000000] mbox=20008064 size=0
[00000000] mutex=20007CC8
[00000000] thread[tcpip_thread]=200078DC fn=08009339 prio=31 stack=256
[00000000] idle(): init done
[00000001] mutex=20007CC8 st=0
[00000001] thread[ecm_input]=20007814 fn=08002939 prio=31 stack=128
[00000064] mutex=20007CC8 st=0
[000000c8] mutex=20007CC8 st=0
[0000012c] mutex=20007CC8 st=0
[00000190] mutex=20007CC8 st=0
[000001e0] altset_cb: wIndex: 1 wValue: 1
[000001e0] mutex=20007CC8 st=11
[000001e0] Assertion "st == BGRT_ST_OK" failed at line 88 in mk/lwip/bgrt/sys_arch.c
with debug line
LWIP_DEBUGF(SYS_ARCH_DEBUG, ("mutex=%p st=%d\n", mutex->mtx, st));
added just after bgrt_mtx_lock() here
first five are locks from same process, this one:
[000001e0] mutex=20007CC8 st=11
from another:
(gdb) bt
#0 0x0800929a in sys_mutex_lock.constprop.0 (mutex=<optimized out>) at mk/lwip/bgrt/sys_arch.c:88
#1 0x08001964 in tcpip_api_call (fn=<optimized out>, call=0x20017c98) at lwip/src/api/tcpip.c:478
#2 netifapi_netif_common (netif=0x20000000 <ecmif>, errtfunc=0x0, voidfunc=<optimized out>) at lwip/src/api/netifapi.c:313
#3 ecm_altset_cb (usbd_dev=<optimized out>, wIndex=1, wValue=1) at ecm.c:288
#4 0x08000314 in usb_standard_set_interface (usbd_dev=<optimized out>, req=0x200000b4 <st_usbfs_dev+80>, buf=<optimized out>, len=0x200000c0 <st_usbfs_dev+92>) at libopencm3/lib/at32/f40x/../../usb/usb_standard.c:315
#5 0x08000ff6 in usb_control_setup_read (usbd_dev=0x20000064 <st_usbfs_dev>, req=0x200000b4 <st_usbfs_dev+80>) at libopencm3/lib/at32/f40x/../../usb/usb_control.c:179
#6 0x08000de0 in st_usbfs_poll (dev=0x20000064 <st_usbfs_dev>) at libopencm3/lib/at32/f40x/../../stm32/common/st_usbfs_core.c:317
#7 0x08000202 in usbd_poll (usbd_dev=<optimized out>) at libopencm3/lib/at32/f40x/../../usb/usb.c:128
#8 usbd_intr (udev=<optimized out>) at ecm.c:358
#9 0x0800af50 in bgrt_vic_iterator (vic=<optimized out>) at bgrt/kernel/vint.c:181
#10 bgrt_kblock_do_work (kblock=<optimized out>) at bgrt/kernel/kernel.c:177
#11 bgrt_kblock_main (kblock=<optimized out>) at bgrt/kernel/kernel.c:215
#12 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
#13 main.isra.0 () at main.c:84
#14 0x08001402 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
straight to bull's eye. not sure whose, though.
[00000000] mbox=20008064 size=0 [00000000] mutex=20007CC8 [00000000] thread[tcpip_thread]=200078DC fn=08009339 prio=31 stack=256 [00000000] idle(): init done [00000001] mutex=20007CC8 st=0 [00000001] thread[ecm_input]=20007814 fn=08002939 prio=31 stack=128 [00000064] mutex=20007CC8 st=0 [000000c8] mutex=20007CC8 st=0 [0000012c] mutex=20007CC8 st=0 [00000190] mutex=20007CC8 st=0 [000001e0] altset_cb: wIndex: 1 wValue: 1 [000001e0] mutex=20007CC8 st=11 [000001e0] Assertion "st == BGRT_ST_OK" failed at line 88 in mk/lwip/bgrt/sys_arch.c
with debug line
LWIP_DEBUGF(SYS_ARCH_DEBUG, ("mutex=%p st=%d\n", mutex->mtx, st));
added just after bgrt_mtx_lock() here
first five are locks from same process, this one:
[000001e0] mutex=20007CC8 st=11
from another:
(gdb) bt #0 0x0800929a in sys_mutex_lock.constprop.0 (mutex=<optimized out>) at mk/lwip/bgrt/sys_arch.c:88 #1 0x08001964 in tcpip_api_call (fn=<optimized out>, call=0x20017c98) at lwip/src/api/tcpip.c:478 #2 netifapi_netif_common (netif=0x20000000 <ecmif>, errtfunc=0x0, voidfunc=<optimized out>) at lwip/src/api/netifapi.c:313 #3 ecm_altset_cb (usbd_dev=<optimized out>, wIndex=1, wValue=1) at ecm.c:288 #4 0x08000314 in usb_standard_set_interface (usbd_dev=<optimized out>, req=0x200000b4 <st_usbfs_dev+80>, buf=<optimized out>, len=0x200000c0 <st_usbfs_dev+92>) at libopencm3/lib/at32/f40x/../../usb/usb_standard.c:315 #5 0x08000ff6 in usb_control_setup_read (usbd_dev=0x20000064 <st_usbfs_dev>, req=0x200000b4 <st_usbfs_dev+80>) at libopencm3/lib/at32/f40x/../../usb/usb_control.c:179 #6 0x08000de0 in st_usbfs_poll (dev=0x20000064 <st_usbfs_dev>) at libopencm3/lib/at32/f40x/../../stm32/common/st_usbfs_core.c:317 #7 0x08000202 in usbd_poll (usbd_dev=<optimized out>) at libopencm3/lib/at32/f40x/../../usb/usb.c:128 #8 usbd_intr (udev=<optimized out>) at ecm.c:358 #9 0x0800af50 in bgrt_vic_iterator (vic=<optimized out>) at bgrt/kernel/vint.c:181 #10 bgrt_kblock_do_work (kblock=<optimized out>) at bgrt/kernel/kernel.c:177 #11 bgrt_kblock_main (kblock=<optimized out>) at bgrt/kernel/kernel.c:215 #12 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221 #13 main.isra.0 () at main.c:84 #14 0x08001402 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
According to back trace sys_mutex_lock
was called from bgrt_kblock_main
i.e. from the kernel thread and not from some process.
This led to undefined behaviour...
I would recommend you to make a process for usbd_poll
and to communicate with it by a counting semaphore.
Anyway thank you for the interesting puzzle!
do not get cocky yet :) this was just startup code path, and we're not in main loop yet, stay tuned :]
do not get cocky yet :) this was just startup code path, and we're not in main loop yet, stay tuned :]
This is the root of the problem:
BGRT_ISR(__usb_isr)
{
nvic_disable_irq(__usb_irq);
bgrt_vint_push_isr(&usbd_vint, &bgrt_kernel.kblock.vic);
}
Every NVIC_USB_LP_IRQ
generates a virtual interrupt which leads to usbd_poll
call from the kernel thread which leads to sys_mutex_lock
and sys_mutex_unlock
(and may be some more) calls each of which leads to a bgrt_syscall
call which leads to UB.
do not get cocky yet :) this was just startup code path, and we're not in main loop yet, stay tuned :]
This is the root of the problem:
BGRT_ISR(__usb_isr) { nvic_disable_irq(__usb_irq); bgrt_vint_push_isr(&usbd_vint, &bgrt_kernel.kblock.vic); }
Every
NVIC_USB_LP_IRQ
generates a virtual interrupt which leads tousbd_poll
call from the kernel thread which leads tosys_mutex_lock
andsys_mutex_unlock
(and may be some more) calls each of which leads to abgrt_syscall
call which leads to UB.
Yes, that is clear. But: mutex ops in vint context were (mistakenly) added few days ago, and i've seen SCALLs earlier too. Now i've changed that to bgrt_map_t flag set from there and actual work in ecm_input proc. No other bgrt API except bgrt_atm* and bgrt_sem_free_cs is used in usb vint context. I'm currently stress-testing, but this might take hours, let 's see. On making usb_poll() ordinary proc -- no, that isn't option, for performance and timing reasons.
There is definitely something wrong with the stack. The normal stack should have "bgrt_proc_terminate" on the bottom.
сб, 8 июн. 2024 г. в 23:16, beefdeadbeef @.***>:
(gdb) r Starting program: /home/lioka/src/mk/at32/f403/ecmit/fw.elf
Program received signal SIGTRAP, Trace/breakpoint trap. 0x08009366 in sys_mutex_lock.constprop.0 (mutex=
) at mk/lwip/bgrt/sys_arch.c:88 88 LWIP_ASSERT("st == BGRT_ST_OK", st == BGRT_ST_OK); 0x0800935a <sys_mutex_lock.constprop.0+106>: 4a0b ldr r2, [pc, #44] @ (0x8009388 <sys_mutex_lock.constprop.0+152>) 0x0800935c <sys_mutex_lock.constprop.0+108>: 9300 str r3, [sp, #0] 0x0800935e <sys_mutex_lock.constprop.0+110>: 4808 ldr r0, [pc, #32] @ (0x8009380 <sys_mutex_lock.constprop.0+144>) 0x08009360 <sys_mutex_lock.constprop.0+112>: 2358 movs r3, #88 @ 0x58 0x08009362 <sys_mutexlock.constprop.0+114>: f7f8 f8c1 bl 0x80014e8 <printf> => 0x08009366 <sys_mutex_lock.constprop.0+118>: be00 bkpt 0x0000 (gdb) bt 0 0x08009366 in sys_mutex_lock.constprop.0 (mutex=
) at mk/lwip/bgrt/sys_arch.c:88 1 0x0800947e in tcpip_timeouts_mbox_fetch (mbox=
, msg= ) at lwip/src/api/tcpip.c:105 2 tcpip_thread (arg=
) at lwip/src/api/tcpip.c:142 3 0x08007fbc in bgrt_syscall (num=
, arg=0x4f080006) at bgrt/arch/cm4f/gcc/bugurt_port.c:252 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
[00000000] mbox=20008064 size=0 [00000000] mutex=20007CC8 [00000000] thread[tcpip_thread]=200078DC fn=08009415 prio=31 stack=256 [00000000] idle(): init done [00000001] thread[ecm_input]=20007814 fn=08002911 prio=31 stack=128 [000001e1] altset_cb: wIndex: 1 wValue: 1 [000001e2] cs_cb: bRequest: 43 wValue: 000c wIndex: 0000 len: 0 data=14 [00000203] cs_cb: bRequest: 43 wValue: 000e wIndex: 0000 len: 0 data=1a [00000214] cs_cb: bRequest: 43 wValue: 000e wIndex: 0000 len: 0 data=1a [00000219] iface is up [00003264] address: 172.22.22.133 [0000c670] mutex=20007CC8 st=8 [0000c670] Assertion "st == BGRT_ST_OK" failed at line 88 in mk/lwip/bgrt/sys_arch.c
sigh. where to look ?
— Reply to this email directly, view it on GitHub https://github.com/shkolnick-kun/bugurtos/issues/43#issuecomment-2156128217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4JAXTTAPOOOEVKRPFL4JDZGNDBTAVCNFSM6AAAAABIZCNJJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGEZDQMRRG4 . You are receiving this because you commented.Message ID: @.***>
Hi, @beefdeadbeef!
Are there any news on the issue?
Hi, @beefdeadbeef!
Are there any news on the issue?
I've got several stack traces like this, and no matter what i tried to do (raised stack spaces, different locations, etc), they all amost identical:
(gdb) bt
#0 0x08009362 in sys_mutex_lock.constprop.0 (mutex=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:88
#1 0x0800947a in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:105
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x08007fb8 in bgrt_syscall (num=<optimized out>, arg=0x4f080006)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0 0x0800932c in sys_mutex_lock.constprop.0 (mutex=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:86
#1 0x08009436 in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:105
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x08007ff4 in bgrt_syscall (num=<optimized out>, arg=0x4f080006)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0 0x080093a2 in sys_mutex_unlock.constprop.0 (mutex=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:97
#1 0x08009428 in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:103
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x08007ff4 in bgrt_syscall (num=<optimized out>, arg=0x4f080006)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0 0x0800931a in sys_sem_signal.constprop.0 (sem=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:85
#1 0x0800992c in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:103
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x08007ff4 in bgrt_syscall (num=<optimized out>, arg=0x4f080006)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0 0x080084d2 in sys_mutex_unlock (mutex=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:120
#1 sys_unlock_tcpip_core () at mk/lwip/bgrt/sys_arch.c:47
#2 0x08008914 in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:103
#3 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#4 0x08008098 in bgrt_syscall (num=<optimized out>, arg=0x64f0800)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) bt
#0 0x0800931e in sys_mutex_lock.constprop.0 (mutex=<optimized out>)
at mk/lwip/bgrt/sys_arch.c:87
#1 0x0800944e in tcpip_timeouts_mbox_fetch (mbox=<optimized out>,
msg=<optimized out>) at lwip/src/api/tcpip.c:105
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x08007fb4 in bgrt_syscall (num=<optimized out>, arg=0x4f080006)
at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
especially arg= in bgrt_syscall (even rotated over once). I 'm out of ideas. And as LWIP has basically two concurrency models: (one with global mutex, another with callback queue), i'm going for latter, which works reliably for me.
btw
(gdb) x 0x0800064f
0x800064f
from libopencm3/lib/cm3/vector.c
maybe having more clewer handler than just while(1) there could be beneficial for bugurtos, no ?
btw (gdb) x 0x0800064f 0x800064f
: 0x034770e7 from libopencm3/lib/cm3/vector.c
maybe having more clewer handler than just while(1) there could be beneficial for bugurtos, no ?
This is intended to be used for fault debugging. You can parse fault status registers in this while(1).
btw (gdb) x 0x0800064f 0x800064f
: 0x034770e7 from libopencm3/lib/cm3/vector.c
maybe having more clewer handler than just while(1) there could be beneficial for bugurtos, no ?
BTW, could you use -Og instead of -Os while debugging?
In such case you can get rid of <optimized out>
thing.
Here's with -Og and without -flto after some half hour running:
(gdb) r
Starting program: /home/lioka/src/mk/at32/f403/ecmit/fw.elf
Program received signal SIGSEGV, Segmentation fault.
bgrt_priv_sync_wake (sync=0x4f85db0, proc=0x70b004eb, chown=71 'G') at bgrt/kernel/sync.c:801
801 owner = sync->owner;
=> 0x08011340 <bgrt_priv_sync_wake+16>: f8d4 6084 ldr.w r6, [r4, #132] @ 0x84
(gdb) bt
#0 bgrt_priv_sync_wake (sync=0x4f85db0, proc=0x70b004eb, chown=71 'G') at bgrt/kernel/sync.c:801
#1 0x0800fb14 in BGRT_SC_SYNC_WAKE_SR (va=<optimized out>) at bgrt/kernel/default/syscall_routines.h:256
#2 0x0800fbb2 in bgrt_priv_do_syscall (syscall_num=<optimized out>, syscall_arg=<optimized out>) at bgrt/kernel/syscall.c:176
#3 0x0800ef58 in _do_int_scall (kblock=kblock@entry=0x200080f0 <bgrt_kernel>) at bgrt/kernel/kernel.c:92
#4 0x0800f0f2 in bgrt_kblock_do_work (kblock=kblock@entry=0x200080f0 <bgrt_kernel>) at bgrt/kernel/kernel.c:185
#5 0x0800f11a in bgrt_kblock_main (kblock=0x200080f0 <bgrt_kernel>) at bgrt/kernel/kernel.c:215
#6 0x0801019e in bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
#7 0x08000956 in main () at main.c:84
(gdb) f 4
#4 0x0800f0f2 in bgrt_kblock_do_work (kblock=kblock@entry=0x200080f0 <bgrt_kernel>) at bgrt/kernel/kernel.c:185
185 _do_int_scall(kblock);
0x0800f0ec <bgrt_kblock_do_work+112>: 4620 mov r0, r4
0x0800f0ee <bgrt_kblock_do_work+114>: f7ff ff25 bl 0x800ef3c <_do_int_scall>
(gdb) p *kblock
$10 = {
vic = {
list = {
item = {0x0 <repeats 32 times>},
map = 0
},
prio = 32 ' '
},
hpmap = 0,
lpmap = 0
}
(gdb) f 3
#3 0x0800ef58 in _do_int_scall (kblock=kblock@entry=0x200080f0 <bgrt_kernel>) at bgrt/kernel/kernel.c:92
92 scret = bgrt_priv_do_syscall(uspd->scnum, uspd->scarg);
0x0800ef50 <_do_int_scall+20>: f8d4 10c0 ldr.w r1, [r4, #192] @ 0xc0
0x0800ef54 <_do_int_scall+24>: f000 fe1d bl 0x800fb92 <bgrt_priv_do_syscall>
(gdb) p *uspd
$11 = {
scarg = 0x20002478 <memp_memory_POOL_1024_base+1940>,
scnum = 17 '\021',
scret = 0 '\000'
}
(gdb) x/4wx uspd->scarg
0x20002478 <memp_memory_POOL_1024_base+1940>: 0x0800fbd5 0x00000063 0x20002490 0x0000000e
(gdb)
it seems it went from here:
0800fbc0 <bgrt_syscall_var>:
800fbc0: b40f push {r0, r1, r2, r3}
800fbc2: b500 push {lr}
800fbc4: b083 sub sp, #12
800fbc6: ab05 add r3, sp, #20
800fbc8: 9301 str r3, [sp, #4]
800fbca: f89d 0010 ldrb.w r0, [sp, #16]
800fbce: a901 add r1, sp, #4
800fbd0: f000 fb0e bl 80101f0 <bgrt_syscall>
800fbd4: b003 add sp, #12
800fbd6: f85d eb04 ldr.w lr, [sp], #4
800fbda: b004 add sp, #16
800fbdc: 4770 bx lr
800fbde: bf00 nop
bgrt_priv_sync_wake (sync=0x4f85db0, proc=0x70b004eb, chown=71 'G')
it seems it went from here:0800fbc0 <bgrt_syscall_var>:
OK.
//We use variadic function to handle multiple arguments...
//It is iplementation defined whether to use stack only or registers and stack...
bgrt_st_t bgrt_syscall_var(bgrt_syscall_t num, ...)
{
bgrt_va_wr_t varg; //This variable is located on the stack...
bgrt_st_t ret;
va_start(varg.list, num); //Now all arguments are on the stack!
ret = bgrt_syscall(num, (void *)&varg);
va_end(varg.list);
return ret;
}
//If the stack gets corrupted before calling the service routine...
BGRT_SC_SR(SYNC_WAKE, bgrt_va_wr_t * va) /* ADLINT:SL:[W0031] not used*/
{
va_list param;
bgrt_sync_t * sync;
BGRT_PID_T pid;
bgrt_flag_t chown;
//Then we shall retrieve garbage from the process stack here
va_copy(param, va->list);
sync = (bgrt_sync_t *)va_arg(param, void *);
pid = (BGRT_PID_T )va_arg(param, void *);
chown = (bgrt_flag_t )va_arg(param, int );
va_end(param);
//And then we shall call
//bgrt_priv_sync_wake (sync=0x4f85db0, proc=0x70b004eb, chown=71 'G')
return bgrt_priv_sync_wake(sync, BGRT_PID_TO_PROC(pid), chown);
}
Seems like this is a stack corruption...
bgrt_st_t bgrt_syscall_var(bgrt_syscall_t num, ...)
bgrt_syscall_t in all bugurt_config_example.h is defined to unsigned char. afair last named argument in variadic function is subject to default promotion, to int in this case, and having there char is asking for troubles just like that, no ? let me check this ...
bgrt_st_t bgrt_syscall_var(bgrt_syscall_t num, ...)
bgrt_syscall_t in all bugurt_config_example.h is defined to unsigned char. afair last named argument in variadic function is subject to default promotion, to int in this case, and having there char is asking for troubles just like that, no ? let me check this ...
in case you wondering -- nope, that's not it :] back to square one.
bgrt_st_t bgrt_syscall_var(bgrt_syscall_t num, ...)
bgrt_syscall_t in all bugurt_config_example.h is defined to unsigned char. afair last named argument in variadic function is subject to default promotion, to int in this case, and having there char is asking for troubles just like that, no ? let me check this ...
in case you wondering -- nope, that's not it :] back to square one.
I know this for sure. Have you managed to find out what it is?
OK! I've studied the
sync
code. The code was written long time ago so I haven't read it for several years. The following quote of my previous post is incorrect:Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls
bgrt_mtx_free
the mutex is owned by the calling process so:* `BGRT_ST_EEMPTY` is returned when there are no processes waiting to lock the mutex; * and `BGRT_ST_OK` is returned when there is at least one waiting process which eventually locks the mutex.
So now I think that returning
BGRT_ST_EEMPTY
is normal...For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that. https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation? I don't have the board which can run
ecmit
so I'll be glad if you start a gdb session on your target board with the following breakpoints set:* [one](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/sync.c#L839) , * [two](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/syscall.c#L101) , * [three](https://github.com/shkolnick-kun/bugurtos/blob/master/kernel/syscall.c#L171) .
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
third breakpoint, just before return BGRT_ST_SCALL
Program received signal SIGTRAP, Trace/breakpoint trap.
bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21) at bgrt/kernel/syscall.c:172
172 __asm__ __volatile__ ("bkpt #2");
=> 0x0800d074 <main.isra.0+1004>: be02 bkpt 0x0002
(gdb) bt
#0 bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21)
at bgrt/kernel/syscall.c:172
#1 _do_int_scall (kblock=<optimized out>) at bgrt/kernel/kernel.c:92
#2 bgrt_kblock_do_work (kblock=<optimized out>) at bgrt/kernel/kernel.c:185
#3 bgrt_kblock_main (kblock=<optimized out>) at bgrt/kernel/kernel.c:215
#4 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
#5 main.isra.0 () at main.c:92
#6 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
(gdb) p *bgrt_kernel.sched.current_proc
$13 = {
parent = {
parent = {
next = 0x20007f50 <memp_memory_bgrt_proc_base+208>,
prev = 0x20017f10
},
list = 0x2000038c <bgrt_kernel+156>,
prio = 31
},
flags = 99,
base_prio = 31,
lres = {
counter = {0 <repeats 32 times>},
map = 0
},
time_quant = 1,
timer = 1,
sync = 0x0,
cnt_lock = 28151,
pmain = 0x800b6bd <tcpip_thread>,
sv_hook = 0x80089eb <sys_arch_dummy_cb>,
rs_hook = 0x80089eb <sys_arch_dummy_cb>,
arg = 0x20008afc <sys_arch_dummy_arg>,
sstart = 0x20002a74 <memp_memory_POOL_1024_base+2052>,
spointer = 0x200029e4 <memp_memory_POOL_1024_base+1908>,
udata = {
scarg = 0x20002a3c <memp_memory_POOL_1024_base+1996>,
scnum = 21,
scret = 0
}
}
while most of proc fileds seem ok for me, proc.cnt_lock isn't -- and i have no bgrt_priv_proc_lock() at all. do you have any idea where this might come from ? no mutexes this time, just bunch of sem_lock/sem_free
Hi!
Can you define some small BGRT_CONFIG_CNT_MAX and define BGRT_CONFIG_TEST?
This can help to track the counter leak...
пн, 24 июн. 2024 г., 00:51 beefdeadbeef @.***>:
OK! I've studied the sync code. The code was written long time ago so I haven't read it for several years. The following quote of my previous post is incorrect:
Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls bgrt_mtx_free the mutex is owned by the calling process so:
BGRT_ST_EEMPTY
is returned when there are no processes waiting to lock the mutex;and
BGRT_ST_OK
is returned when there is at least one waiting process which eventually locks the mutex.So now I think that returning BGRT_ST_EEMPTY is normal...
For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that.
https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation? I don't have the board which can run ecmit so I'll be glad if you start a gdb session on your target board with the following breakpoints set:
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
third breakpoint, just before return BGRT_ST_SCALL
Program received signal SIGTRAP, Trace/breakpoint trap. bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21) at bgrt/kernel/syscall.c:172 172 asm volatile ("bkpt #2"); => 0x0800d074 <main.isra.0+1004>: be02 bkpt 0x0002 (gdb) bt
0 bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21)
at bgrt/kernel/syscall.c:172
1 _do_int_scall (kblock=
) at bgrt/kernel/kernel.c:92 2 bgrt_kblock_do_work (kblock=
) at bgrt/kernel/kernel.c:185 3 bgrt_kblock_main (kblock=
) at bgrt/kernel/kernel.c:215 4 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
5 main.isra.0 () at main.c:92
6 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
(gdb) p *bgrt_kernel.sched.current_proc $13 = { parent = { parent = { next = 0x20007f50 <memp_memory_bgrt_proc_base+208>, prev = 0x20017f10 }, list = 0x2000038c <bgrt_kernel+156>, prio = 31 }, flags = 99, base_prio = 31, lres = { counter = {0 <repeats 32 times>}, map = 0 }, time_quant = 1, timer = 1, sync = 0x0, cnt_lock = 28151, pmain = 0x800b6bd
, sv_hook = 0x80089eb , rs_hook = 0x80089eb , arg = 0x20008afc , sstart = 0x20002a74 <memp_memory_POOL_1024_base+2052>, spointer = 0x200029e4 <memp_memory_POOL_1024_base+1908>, udata = { scarg = 0x20002a3c <memp_memory_POOL_1024_base+1996>, scnum = 21, scret = 0 } } while most of proc fileds seem ok for me, proc.cnt_lock isn't -- and i have no bgrt_priv_proc_lock() at all. do you have any idea where this might come from ?
— Reply to this email directly, view it on GitHub https://github.com/shkolnick-kun/bugurtos/issues/43#issuecomment-2185305566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4JAXXH7PVJGQIHIPPPYHLZI4RNHAVCNFSM6AAAAABIZCNJJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGMYDKNJWGY . You are receiving this because you commented.Message ID: @.***>
And yes, scnum of 21 is not with zero scret is another problem...
пн, 24 июн. 2024 г., 14:47 Noname Nothing @.***>:
Hi!
Can you define some small BGRT_CONFIG_CNT_MAX and define BGRT_CONFIG_TEST?
This can help to track the counter leak...
пн, 24 июн. 2024 г., 00:51 beefdeadbeef @.***>:
OK! I've studied the sync code. The code was written long time ago so I haven't read it for several years. The following quote of my previous post is incorrect:
Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls bgrt_mtx_free the mutex is owned by the calling process so:
BGRT_ST_EEMPTY
is returned when there are no processes waiting to lock the mutex;and
BGRT_ST_OK
is returned when there is at least one waiting process which eventually locks the mutex.So now I think that returning BGRT_ST_EEMPTY is normal...
For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that.
https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation? I don't have the board which can run ecmit so I'll be glad if you start a gdb session on your target board with the following breakpoints set:
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
third breakpoint, just before return BGRT_ST_SCALL
Program received signal SIGTRAP, Trace/breakpoint trap. bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21) at bgrt/kernel/syscall.c:172 172 asm volatile ("bkpt #2"); => 0x0800d074 <main.isra.0+1004>: be02 bkpt 0x0002 (gdb) bt
0 bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21)
at bgrt/kernel/syscall.c:172
1 _do_int_scall (kblock=
) at bgrt/kernel/kernel.c:92 2 bgrt_kblock_do_work (kblock=
) at bgrt/kernel/kernel.c:185 3 bgrt_kblock_main (kblock=
) at bgrt/kernel/kernel.c:215 4 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
5 main.isra.0 () at main.c:92
6 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
(gdb) p *bgrt_kernel.sched.current_proc $13 = { parent = { parent = { next = 0x20007f50 <memp_memory_bgrt_proc_base+208>, prev = 0x20017f10 }, list = 0x2000038c <bgrt_kernel+156>, prio = 31 }, flags = 99, base_prio = 31, lres = { counter = {0 <repeats 32 times>}, map = 0 }, time_quant = 1, timer = 1, sync = 0x0, cnt_lock = 28151, pmain = 0x800b6bd
, sv_hook = 0x80089eb , rs_hook = 0x80089eb , arg = 0x20008afc , sstart = 0x20002a74 <memp_memory_POOL_1024_base+2052>, spointer = 0x200029e4 <memp_memory_POOL_1024_base+1908>, udata = { scarg = 0x20002a3c <memp_memory_POOL_1024_base+1996>, scnum = 21, scret = 0 } } while most of proc fileds seem ok for me, proc.cnt_lock isn't -- and i have no bgrt_priv_proc_lock() at all. do you have any idea where this might come from ?
— Reply to this email directly, view it on GitHub https://github.com/shkolnick-kun/bugurtos/issues/43#issuecomment-2185305566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4JAXXH7PVJGQIHIPPPYHLZI4RNHAVCNFSM6AAAAABIZCNJJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGMYDKNJWGY . You are receiving this because you commented.Message ID: @.***>
I mean scnum of 21 is error and zero scret in this case too, as we initiate scret with BGRT_ST_B4SC before process to kernel context switch...
пн, 24 июн. 2024 г., 14:50 Noname Nothing @.***>:
And yes, scnum of 21 is not with zero scret is another problem...
пн, 24 июн. 2024 г., 14:47 Noname Nothing @.***>:
Hi!
Can you define some small BGRT_CONFIG_CNT_MAX and define BGRT_CONFIG_TEST?
This can help to track the counter leak...
пн, 24 июн. 2024 г., 00:51 beefdeadbeef @.***>:
OK! I've studied the sync code. The code was written long time ago so I haven't read it for several years. The following quote of my previous post is incorrect:
Dirty means some process is going to sleep but it did not yet. So that's a bug in the current mutex implementation. I'm fixing it now.
In fact when some process calls bgrt_mtx_free the mutex is owned by the calling process so:
BGRT_ST_EEMPTY
is returned when there are no processes waiting to lock the mutex;and
BGRT_ST_OK
is returned when there is at least one waiting process which eventually locks the mutex.So now I think that returning BGRT_ST_EEMPTY is normal...
For BGRT_ST_SCALL, it usually takes several hours of work under load until fail, not sure if i could prepare reasonably small test for that.
https://github.com/beefdeadbeef/ecmit/blob/master/mk/lwip/bgrt/sys_arch.c#L97
Can you help me with further investigation? I don't have the board which can run ecmit so I'll be glad if you start a gdb session on your target board with the following breakpoints set:
The thirst one means some mutex related bug others mean something else. A call stack and some object states may be needed to trace the issue.
third breakpoint, just before return BGRT_ST_SCALL
Program received signal SIGTRAP, Trace/breakpoint trap. bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21) at bgrt/kernel/syscall.c:172 172 asm volatile ("bkpt #2"); => 0x0800d074 <main.isra.0+1004>: be02 bkpt 0x0002 (gdb) bt
0 bgrt_priv_do_syscall (syscall_arg=0x20002a3c <memp_memory_POOL_1024_base+1996>, syscall_num=21)
at bgrt/kernel/syscall.c:172
1 _do_int_scall (kblock=
) at bgrt/kernel/kernel.c:92 2 bgrt_kblock_do_work (kblock=
) at bgrt/kernel/kernel.c:185 3 bgrt_kblock_main (kblock=
) at bgrt/kernel/kernel.c:215 4 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
5 main.isra.0 () at main.c:92
6 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:93
(gdb) p *bgrt_kernel.sched.current_proc $13 = { parent = { parent = { next = 0x20007f50 <memp_memory_bgrt_proc_base+208>, prev = 0x20017f10 }, list = 0x2000038c <bgrt_kernel+156>, prio = 31 }, flags = 99, base_prio = 31, lres = { counter = {0 <repeats 32 times>}, map = 0 }, time_quant = 1, timer = 1, sync = 0x0, cnt_lock = 28151, pmain = 0x800b6bd
, sv_hook = 0x80089eb , rs_hook = 0x80089eb , arg = 0x20008afc , sstart = 0x20002a74 <memp_memory_POOL_1024_base+2052>, spointer = 0x200029e4 <memp_memory_POOL_1024_base+1908>, udata = { scarg = 0x20002a3c <memp_memory_POOL_1024_base+1996>, scnum = 21, scret = 0 } } while most of proc fileds seem ok for me, proc.cnt_lock isn't -- and i have no bgrt_priv_proc_lock() at all. do you have any idea where this might come from ?
— Reply to this email directly, view it on GitHub https://github.com/shkolnick-kun/bugurtos/issues/43#issuecomment-2185305566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4JAXXH7PVJGQIHIPPPYHLZI4RNHAVCNFSM6AAAAABIZCNJJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGMYDKNJWGY . You are receiving this because you commented.Message ID: @.***>
with BGRT_CONFIG_CNT_MAX defined to 32:
(gdb) bt
#0 0x08008936 in bgrt_priv_proc_lock () at bgrt/kernel/proc.c:290
#1 0x0800a4dc in _sem_lock_fsm (va=<optimized out>) at bgrt/libs/native/sem.c:149
#2 0x08008f46 in BGRT_SC_USER_SR (va=<optimized out>) at bgrt/kernel/default/syscall_routines.h:311
#3 0x0800cfa4 in bgrt_priv_do_syscall (syscall_arg=<optimized out>, syscall_num=<optimized out>) at bgrt/kernel/syscall.c:176
#4 _do_int_scall (kblock=<optimized out>) at bgrt/kernel/kernel.c:92
#5 bgrt_kblock_do_work (kblock=<optimized out>) at bgrt/kernel/kernel.c:185
#6 bgrt_kblock_main (kblock=<optimized out>) at bgrt/kernel/kernel.c:215
#7 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221
#8 main.isra.0 () at main.c:89
#9 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:9
and BGRT_CNT_INC got inlined:
08008918 <bgrt_priv_proc_lock>:
8008918: 4b07 ldr r3, [pc, #28] @ (8008938 <bgrt_priv_proc_lock+0x20>)
800891a: f8d3 3090 ldr.w r3, [r3, #144] @ 0x90
800891e: 691a ldr r2, [r3, #16]
8008920: f042 0220 orr.w r2, r2, #32
8008924: 611a str r2, [r3, #16]
8008926: f8d3 20a8 ldr.w r2, [r3, #168] @ 0xa8
800892a: 2a1f cmp r2, #31
800892c: /-- d803 bhi.n 8008936 <bgrt_priv_proc_lock+0x1e>
800892e: | 3201 adds r2, #1
8008930: | f8c3 20a8 str.w r2, [r3, #168] @ 0xa8
8008934: | 4770 bx lr
8008936: +-- e7fe b.n 8008936 <bgrt_priv_proc_lock+0x1e>
8008938: 200007a8 .word 0x200007a8
with BGRT_CONFIG_CNT_MAX defined to 32:
(gdb) bt #0 0x08008936 in bgrt_priv_proc_lock () at bgrt/kernel/proc.c:290 #1 0x0800a4dc in _sem_lock_fsm (va=<optimized out>) at bgrt/libs/native/sem.c:149 #2 0x08008f46 in BGRT_SC_USER_SR (va=<optimized out>) at bgrt/kernel/default/syscall_routines.h:311 #3 0x0800cfa4 in bgrt_priv_do_syscall (syscall_arg=<optimized out>, syscall_num=<optimized out>) at bgrt/kernel/syscall.c:176 #4 _do_int_scall (kblock=<optimized out>) at bgrt/kernel/kernel.c:92 #5 bgrt_kblock_do_work (kblock=<optimized out>) at bgrt/kernel/kernel.c:185 #6 bgrt_kblock_main (kblock=<optimized out>) at bgrt/kernel/kernel.c:215 #7 bgrt_start () at bgrt/arch/cm4f/gcc/bugurt_port.c:221 #8 main.isra.0 () at main.c:89 #9 0x080019e0 in reset_handler () at libopencm3/lib/at32/f40x/../../cm3/vector.c:9
and BGRT_CNT_INC got inlined:
08008918 <bgrt_priv_proc_lock>: 8008918: 4b07 ldr r3, [pc, #28] @ (8008938 <bgrt_priv_proc_lock+0x20>) 800891a: f8d3 3090 ldr.w r3, [r3, #144] @ 0x90 800891e: 691a ldr r2, [r3, #16] 8008920: f042 0220 orr.w r2, r2, #32 8008924: 611a str r2, [r3, #16] 8008926: f8d3 20a8 ldr.w r2, [r3, #168] @ 0xa8 800892a: 2a1f cmp r2, #31 800892c: /-- d803 bhi.n 8008936 <bgrt_priv_proc_lock+0x1e> 800892e: | 3201 adds r2, #1 8008930: | f8c3 20a8 str.w r2, [r3, #168] @ 0xa8 8008934: | 4770 bx lr 8008936: +-- e7fe b.n 8008936 <bgrt_priv_proc_lock+0x1e> 8008938: 200007a8 .word 0x200007a8
Thank you!
I've fixed _sem_lock_fsm.
(gdb) r
Starting program: /home/lioka/src/mk/at32/f403/ecmit/fw.elf
Program received signal SIGTRAP, Trace/breakpoint trap.
0x08009948 in sys_arch_mbox_fetch (msg=msg@entry=0x20002a68 <memp_memory_POOL_1024_base+2044>, timeout=39, mbox=<optimized out>) at mk/lwip/bgrt/sys_arch.c:218
218 LWIP_ASSERT("st == BGRT_ST_ROLL", st == BGRT_ST_ROLL);
0x0800993c <sys_arch_mbox_fetch+80>: 4817 ldr r0, [pc, #92] @ (0x800999c <sys_arch_mbox_fetch+176>)
0x0800993e <sys_arch_mbox_fetch+82>: f8cd 9000 str.w r9, [sp]
0x08009942 <sys_arch_mbox_fetch+86>: 23da movs r3, #218 @ 0xda
0x08009944 <sys_arch_mbox_fetch+88>: f7f8 f8fc bl 0x8001b40 <printf_>
=> 0x08009948 <sys_arch_mbox_fetch+92>: be00 bkpt 0x0000
(gdb) bt
#0 0x08009948 in sys_arch_mbox_fetch (msg=msg@entry=0x20002a68 <memp_memory_POOL_1024_base+2044>, timeout=39, mbox=<optimized out>) at mk/lwip/bgrt/sys_arch.c:218
#1 0x0800b630 in tcpip_timeouts_mbox_fetch (mbox=<optimized out>, msg=<optimized out>) at lwip/src/api/tcpip.c:104
#2 tcpip_thread (arg=<optimized out>) at lwip/src/api/tcpip.c:142
#3 0x080097ac in bgrt_syscall (num=<optimized out>, arg=0x8000c2f <blocking_handler>) at bgrt/arch/cm4f/gcc/bugurt_port.c:252
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p *bgrt_kernel.sched.current_proc
$1 = {
parent = {
parent = {
next = 0x20007f1c <memp_memory_bgrt_proc_base+208>,
prev = 0x20007fec <memp_memory_bgrt_proc_base+416>
},
list = 0x20000844 <bgrt_kernel+156>,
prio = 31
},
flags = 67,
base_prio = 31,
lres = {
counter = {0 <repeats 32 times>},
map = 0
},
time_quant = 1,
timer = 1,
sync = 0x0,
cnt_lock = 0,
pmain = 0x800b5e1 <tcpip_thread>,
sv_hook = 0x800891f <sys_arch_dummy_cb>,
rs_hook = 0x800891f <sys_arch_dummy_cb>,
arg = 0x200088f4 <sys_arch_dummy_arg>,
sstart = 0x20002a70 <memp_memory_POOL_1024_base+2052>,
spointer = 0x200029bc <memp_memory_POOL_1024_base+1872>,
udata = {
scarg = 0x20002a10 <memp_memory_POOL_1024_base+1956>,
scnum = 21,
scret = 8
}
}
so, cnt_lock issue is gone indeed. as for the rest, i'll try to catch something useful later.
Good news, everyone! I've managed to fix bug in my code (tldr i've f***d up systick/pendsv priorities) and now it works as intended. Thanks for your patience and ofc for bugurtos. --> two cups of tea here <--
Good news, everyone! I've managed to fix bug in my code (tldr i've f***d up systick/pendsv priorities) and now it works as intended. Thanks for your patience and ofc for bugurtos. --> two cups of tea here <--
Good! You are welcome!
Ok, not flags, but return values. Consider two processes, of equal lowest priority, one constantly looping over lock/free, and another, trying to interact with common shared resource -- classics. Both lock/free calls followed by asserts on BGRT_ST_OK return values -- which fails for me regularly on mtx_free(). First, BGRT_ST_EEMPTY, often. What is the practical meaning ? No other processes were blocked lately by this lock ? Why not return BGRT_ST_OK in that case and be done with it ? Otherwise, is it safe to treat EEMPTY as sort of OK ? Rare and scary one, BGRT_ST_SCALL -- same questions.