Closed stephanosio closed 2 years ago
@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?
@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?
@bbolen There is a Cortex-A (and -R) FP sharing implementation that sort of works here: https://github.com/ibirnbaum/zephyr/blob/armv7_cortex_a/arch/arm/core/aarch32/swap_helper.S
@stephanosio thank you
Can you elaborate on the description above with respect to items 5 and 6? I'm struggling to understand why the vfp registers would be saved on the exception stack and not the thread context in the normal case. I can see needing to temporarily put it on the exception stack in order for the exception handler to use the VFP unit, but I would assume they would be popped off that stack and pushed onto the thread context during the context switch.
Here is a working implementation of floating point support for Cortex-R. It does lazy context switching. It is based on v2.3.0. There are some conflicts with HEAD, but it will be a while before I can get around to looking at those. I'm putting this out there in case others need a starting point for FPU support before I can get this merge worthy.
Some of R4F socs have double precision. For example quote from TI Hercules brochure Floating Point Unit (FPU) • FPU is compliant to IEEE754 • 16 double-word (64 bits) registers • 32 single-word (32 bits) registers • Supports features: – Single-precision and double-precision add, subtract, multiply, divide, multiply and accumulate, and square root operations – Conversions between fixed-point and floating-point data formats, etc – Comparisons – Underflow – Exceptions
Hi @bbolen , I am working on supporting a cortex-r5f chip in zephyr, and I've spliced in your code from this post: https://github.com/zephyrproject-rtos/zephyr/issues/19979#issuecomment-758091058
I have it building, however when trying to flash it onto the board I am encountering the following error: Debug: 387 144 cortex_a.c:301 cortex_a_exec_opcode(): exec opcode 0xee000e15 Debug: 388 145 armv4_5.c:496 arm_set_cpsr(): set CPSR 0x000003db: Undefined instruction mode, ARM state
When looking at the ARM documentation here: https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/The-System-Level-Programmers--Model/Exceptions/Undefined-Instruction-exception?lang=en
I found this section: The Undefined Instruction exception can be used for:
software emulation of a coprocessor in a system that does not have the physical coprocessor hardware lazy context switching of coprocessor registers general-purpose instruction set extension by software emulation signaling an illegal instruction execution division by zero errors.
Do you know if my error is a coincidence, or related in the way described? If so, do you have a suggestion?
It could be related. The FPU is usually disabled. When the code gets to a floating point instruction, an undefined instruction happens, the FPU gets enabled, and execution starts again on the floating point instruction that caused the fault. So one undefined instruction exception would be expected when using floating point, but it wouldn't crash anything.
I'm unavailable for the rest of the week, but I'll look closer at your details above on Monday.
`Open On-Chip Debugger 0.11.0+dev-00242-g7036ed509-dirty (2021-08-03-17:04) Licensed under GNU GPL v2 For bug reports, read http://openocd.org/doc/doxygen/bugs.html Info : TI BE-32 quirks mode is enabled Info : XDS110: connected Info : XDS110: vid/pid = 0451/bef3 Info : XDS110: firmware version = 3.0.0.16 Info : XDS110: hardware version = 0x0029 Info : XDS110: connected to target via JTAG Info : XDS110: TCK set to 2500 kHz Info : clock speed 1500 kHz Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1) Info : JTAG tap: tms570.cpu enabled Info : tms570.cpu: hardware has 8 breakpoints, 8 watchpoints Info : starting gdb server for tms570.cpu on 3333 Info : Listening on port 3333 for gdb connections TargetName Type Endian TapName State
0* tms570.cpu cortex_r4 big tms570.cpu running
Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1)
Info : JTAG tap: tms570.cpu enabled
Warn : tms570.cpu: ran after reset and before halt ...
Info : tms570.cpu: MPIDR level2 0, cluster 0, core 0, mono core, no SMT
target halted in ARM state due to debug-request, current mode: Undefined instruction
cpsr: 0x000003db pc: 0x00000004
D-Cache: disabled, I-Cache: disabled
flash
flash bank bank_id driver_name base_address size_bytes chip_width_bytes
bus_width_bytes target [driver_options ...]
flash banks
flash init
flash list
gdb_flash_program ('enable'|'disable')
nand
program
Info : XDS110: disconnected FATAL ERROR: command exited with status 1`
-- Application: /home/smith/zephyrproject/zephyr/samples/hello_world -- Zephyr version: 2.7.0-rc1 (/home/smith/zephyrproject/zephyr), build: v1.12.0-34809-g29387287d9f7 -- Found Python3: /usr/bin/python3.8 (found suitable exact version "3.8.10") found components: Interpreter -- Found west (found suitable version "0.11.1", minimum required is "0.7.1") -- Board: hercules_tms570lc43x -- Cache files will be written to: /home/smith/.cache/zephyr -- Using toolchain: zephyr 0.13.0 (/home/smith/zephyr-sdk-0.13.0) -- Open On-Chip Debugger 0.11.0+dev-00358-g6c1e1a212-dirty (2021-08-26-13:54)
My local zephyr repo was cloned from your repo here: https://github.com/bbolen/zephyr/commits/cortex_r_fpu and I updated it to the latest version.
At the time of writing, ARM Cortex-R port does not support the use of hardware floating-point unit (VFP and NEON).
Considering common application scenarios for Cortex-R (real-time processing), it is imperative that hardware floating-point unit support is available for it; otherwise, practical usability of the Cortex-R port becomes questionable.
Overview
An overview of the hardware floating-point unit for Cortex-R is as follows:
Specifications
ARM Cortex-R floating-point support feature shall:
support Unshared FP registers mode and Shared FP registers mode.
optionally support emulation of the VFP instructions that are unimplemented by hardware.
ARM Cortex-R floating-point support feature, for Shared FP registers mode, shall:
manage FP enable status at thread level, in conformance with the kernel FP interface.
K_FP_REGS
option shall be used to specify whether thread-wide floating-point support is enabled.K_FP_REGS
option may be (re-)enabled only for the threads that were initially created with the same option.disable FPU after a context switch and re-enable it upon exception.
K_FP_REGS
option is set for the thread.store s0-s15 and FPSCR in exception stack frame and s16-s31 in thread context.
implement lazy stacking of FP context.
preserve s16-s31 during context switch only when FPU is enabled.
Note