Closed Irockasingranite closed 4 months ago
Thanks for posting, sorry you're having trouble. Not something we've seen I'm afraid: do you have a .map
file that might tell you what has been loaded at 0x0000842C
?
Checking zephyr.map
, the relevant part appears to be this:
.text.ubxlib_preinit
0x0000000000008424 0x18 app/libapp.a(u_port.c.obj)
.text.main 0x000000000000843c 0x28 app/libapp.a(main.c.obj)
0x000000000000843c main
, which further points at ubxlib as the 'culprit'.
Looking at ubxlib_preinit
in the zephyr specific u_port.c
, all it has is a call to k_thread_system_pool_assign
. Commenting out that line does in fact get rid of the securefault at boot (but probably badly breaks ubxlib :) ). The fact that it's an init function also explains why simply compiling the library is enough to trigger the issue, without explicitly linking to it from the application.
Checking the same zephyr.map
again, I see that the system heap that's being assigned is sitting at 0x2000c68c
, though, which should be well inside the 'nonsecure' ram area (the 'secure' region spans 0x20000000
to 0x20008000
):
k_heap_area 0x000000002000c68c 0x14 load address 0x0000000000011874
0x000000002000c68c _k_heap_list_start = .
*(SORT_BY_NAME(SORT_BY_ALIGNMENT(._k_heap.static.*)))
._k_heap.static._system_heap_
0x000000002000c68c 0x14 zephyr/kernel/libkernel.a(mempool.c.obj)
0x000000002000c68c _system_heap
0x000000002000c6a0 _k_heap_list_end = .
Maybe k_thread_system_pool_assign()
somehow refers to something within the secured area? Is it possible to attach a debugger and break point at the function then single-step, see in detail what it is trying to do?
k_thread_system_pool_assign()
itself isn't very complicated. From zephyr/kernel/mempool.c
:
K_HEAP_DEFINE(_system_heap, CONFIG_HEAP_MEM_POOL_SIZE);
#define _SYSTEM_HEAP (&_system_heap)
/* [...] */
void k_thread_system_pool_assign(struct k_thread *thread)
{
thread->resource_pool = _SYSTEM_HEAP;
}
Hmph. Stating the obvious, if both thread->resource_pool
and _SYSTEM_HEAP
are outside the range 0x20000000
to 0x20008000
, that suggests that, somehow, more than is intended is being secured. Could you maybe try reading from _SYSTEM_HEAP
and thread->resource_pool
, just as a standalone operation, slightly before this, see if those also hit the security issue?
Yes, testing those individually got me further: It's not the heap at all, it's writing to thread
. Stepping through shows that what k_current_get()
returns is 0xf75
, which... doesn't seem right.
So k_current_get()
isn't working correctly, which shouldn't really be surprising. ubxlib_preinit
is running at init level PRE_KERNEL_1
, for which the Zephyr documentation states: "These devices cannot use any kernel services during configuration, since the kernel services are not yet available".
Is there a reason ubxlib_preinit
needs to run this early? If I change it to POST_KERNEL
, k_current_get()
gives me 0x2000cc98
, which seems much more reasonable, and there is no securefault.
Progress, well done!
ubxlib_preinit()
has always been set to run at PRE_KERNEL_1
, I guess so that the heap is set up before there's a chance that any tasks get run? The change that introduced it, two years ago, is https://github.com/u-blox/ubxlib/commit/fb63a1f22333c840464a658c708327c04191826e, where the commit comment says:
Systempool is now allocated during startup in z_sys_init_run_level, threads created afterwards will inherit this pool. This resolves a known issue in Zephyr when calling UBXLIB API from threads that isn't the Zephyr main thread.
I'd guess that if POST_KERNEL
works for you then it is probably OK. Let me run that change through the test system and see what happens.
What I don't understand is why calling k_thread_system_pool_assign()
with the parameter k_current_get()
has worked for the last 2 years (and still works on nRF52/nRF53), if the return value of k_current_get()
is garbage at that time...?
Checking the blame on zephyr/include/zephyr/kernel.h
I see some changes relating to thread local storage in k_current_get()
a few months ago, perhaps that's the change pulled in by the newer sdk versions that broke this?
Maybe, there do seem to have been changes in this area, though we don't use CONFIG_THREAD_LOCAL_STORAGE
or CONFIG_CURRENT_THREAD_USE_TLS
(at least not directly) and we are definitely using nRFConnect 2.5.0 which I hope isn't a moveable fast as far as the underlying Zephyr is concerned...?
EDIT: Ah, I see there has been a 2.5.1
, a 2.5.2
and there seems to be a kind of "head-rev" 2.5.99-dev1
lying around as well.
Anyway, the statement you have quoted from the Zephyr documentation is pretty clear and replacing PRE_KERNEL1
with POST_KERNEL
in the call to SYS_INIT()
passes our tests. Let me get that change reviewed and, assuming everyone is OK with it, I will push it to master
here and update this issue when I've done that.
Fix is now in master
here, see https://github.com/u-blox/ubxlib/commit/3c8d2d9541c92089780a73d098c8c2a19d0578f6. I will close this issue now: please feel free to re-open, or open another, if there is more to discuss.
Hi, we're trying to use a u-blox gnss module on an nrf9160 based board, using the nrf sdk and pulling in ubxlib as a zephyr module. We recently updated the nrf sdk to v2.5.x, and since then our application cannot start, as Trusted Firmware-M crashes on a
SecureFault
:The issue does not appear when using nrf sdk v2.4.3, but does on v2.5.0.
This is likely not an issue specific to ubxlib, but it is the only correlation we've been able to identify. Since according to the documentation you support and test with nrf sdk v2.5.0, we're wondering if you've seen this or similar issues yourself, and know of a fix or workaround.
For a minimal example of the issue, see this repository.