Open knightshrub opened 3 years ago
No, this only enables hardware acceleration. However when there are double-precision operations in the code, the compiler will not simply substitute float
for double
.
Using the compiler option -Wdouble-promotion
will throw a lot of warnings for floating point values being promoted to double
in expressions where the other operand is a double
or because the compiler treats constants like 1.0
as a double
.
The compiler option -fsingle-precision-constant
changes this behavior to interpret 1.0
the same as 1.0f
(both float
s), but this doesn't seem to fix the issue in case of liquid-dsp.
There is nothing to be done about it if liquid-dsp makes extensive use of double
in the code.
In that case I wouldn't call the library suitable for embedded-system though, unless embedded system refers to Linux SBCs like the Raspberry Pi.
@knightshrub this is odd as I try to avoid double-precision operations wherever possible. There are only a few cases where double-precision is needed, but I can't imagine anything to do with the ampmodem
would need double precision.
@jgaeddert thanks for the reply, I realize my previous comment might have come off as a little rude, I certainly didn't mean it that way.
First of all, this is how I compile libliquid:
# remove any leftovers
make distclean
./bootstrap.sh
# this might very well be the culprit, I'm not super familiar with autoconf
# I also had to comment out AC_FUNC_MALLOC and AC_FUNC_REALLOC in configure.ac
CFLAGS='-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -O3 -fsingle-precision-constant -Wdouble-promotion' LDFLAGS='-specs=nosys.specs' ./configure --host=arm-none-eabi
make
It appears that all the references to __aeabi_d.*
are introduced by the NCO module.
Does the NCO code really require double precision arithmetic?
As an example let's have a look at nco_crcf_get_freq
:
080010cc <nco_crcf_get_frequency>:
80010cc: f500 5080 add.w r0, r0, #4096 ; 0x1000
80010d0: b538 push {r3, r4, r5, lr}
80010d2: 6880 ldr r0, [r0, #8]
80010d4: ee07 0a90 vmov s15, r0
80010d8: eef8 7a67 vcvt.f32.u32 s15, s15
80010dc: ed2d 8b02 vpush {d8}
80010e0: ee17 0a90 vmov r0, s15
80010e4: f02d facc bl 802e680 <__aeabi_f2d>
80010e8: a315 add r3, pc, #84 ; (adr r3, 8001140 <nco_crcf_get_frequency+0x74>)
80010ea: e9d3 2300 ldrd r2, r3, [r3]
80010ee: f02d fb1f bl 802e730 <__aeabi_dmul>
80010f2: 4b15 ldr r3, [pc, #84] ; (8001148 <nco_crcf_get_frequency+0x7c>)
80010f4: 2200 movs r2, #0
80010f6: f02d fb1b bl 802e730 <__aeabi_dmul>
80010fa: f02d fe11 bl 802ed20 <__aeabi_d2f>
80010fe: ee08 0a10 vmov s16, r0
8001102: f02d fabd bl 802e680 <__aeabi_f2d>
8001106: a30c add r3, pc, #48 ; (adr r3, 8001138 <nco_crcf_get_frequency+0x6c>)
8001108: e9d3 2300 ldrd r2, r3, [r3]
800110c: 4604 mov r4, r0
800110e: 460d mov r5, r1
8001110: f02d fd9e bl 802ec50 <__aeabi_dcmpgt>
8001114: b150 cbz r0, 800112c <nco_crcf_get_frequency+0x60>
8001116: a30a add r3, pc, #40 ; (adr r3, 8001140 <nco_crcf_get_frequency+0x74>)
8001118: e9d3 2300 ldrd r2, r3, [r3]
800111c: 4620 mov r0, r4
800111e: 4629 mov r1, r5
8001120: f02d f94e bl 802e3c0 <__aeabi_dsub>
8001124: f02d fdfc bl 802ed20 <__aeabi_d2f>
8001128: ee08 0a10 vmov s16, r0
800112c: eeb0 0a48 vmov.f32 s0, s16
8001130: ecbd 8b02 vpop {d8}
8001134: bd38 pop {r3, r4, r5, pc}
8001136: bf00 nop
8001138: 54442d18 .word 0x54442d18
800113c: 400921fb .word 0x400921fb
8001140: 54442d18 .word 0x54442d18
8001144: 401921fb .word 0x401921fb
8001148: 3df00000 .word 0x3df00000
Looking at the source
https://github.com/jgaeddert/liquid-dsp/blob/b9de495b0d529977914a4bbcd1295208e5de07c0/src/nco/src/nco.c#L167-L172
reveals that the code uses M_PI
which according to the GNU libm documentation is a double
. This probably leads to the entire expression being promoted to double precision arithmetic, although the the -Wdouble-promotion
flag doesn't seem to complain about it.
A piece of code that gcc with -Wdouble-promotion
does complain about is this:
src/nco/src/nco.c: In function 'nco_crcf_mix_down':
src/nco/src/nco.c:283:14: warning: implicit conversion from 'complex float' to 'complex double' to match other operand of binary expression [-Wdouble-promotion]
283 | *_y = _x * conj(v);
Again looking at the source
https://github.com/jgaeddert/liquid-dsp/blob/b9de495b0d529977914a4bbcd1295208e5de07c0/src/nco/src/nco.c#L270-L285
reveals the use of conj
, which according to the GNU libm documentation returns a double
. This function should probably be conjf
instead.
The disassembly accordingly shows many references to software double precision floating point functions:
08001460 <nco_crcf_mix_down>:
8001460: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
8001464: ed2d 8b04 vpush {d8-d9}
8001468: ee10 5a10 vmov r5, s0
800146c: b085 sub sp, #20
800146e: 4688 mov r8, r1
8001470: a902 add r1, sp, #8
8001472: ee10 4a90 vmov r4, s1
8001476: f8cd 8004 str.w r8, [sp, #4]
800147a: f7ff ff87 bl 800138c <nco_crcf_cexpf>
800147e: 4628 mov r0, r5
8001480: f02d f8fe bl 802e680 <__aeabi_f2d>
8001484: 4680 mov r8, r0
8001486: 4620 mov r0, r4
8001488: 4689 mov r9, r1
800148a: f02d f8f9 bl 802e680 <__aeabi_f2d>
800148e: ec41 0b18 vmov d8, r0, r1
8001492: 9802 ldr r0, [sp, #8]
8001494: f02d f8f4 bl 802e680 <__aeabi_f2d>
8001498: ec41 0b19 vmov d9, r0, r1
800149c: 9803 ldr r0, [sp, #12]
800149e: f02d f8ef bl 802e680 <__aeabi_f2d>
80014a2: ec53 2b19 vmov r2, r3, d9
80014a6: 4606 mov r6, r0
80014a8: f101 4700 add.w r7, r1, #2147483648 ; 0x80000000
80014ac: 4640 mov r0, r8
80014ae: 4649 mov r1, r9
80014b0: f02d f93e bl 802e730 <__aeabi_dmul>
80014b4: 4632 mov r2, r6
80014b6: 4604 mov r4, r0
80014b8: 460d mov r5, r1
80014ba: 463b mov r3, r7
80014bc: ec51 0b18 vmov r0, r1, d8
80014c0: f02d f936 bl 802e730 <__aeabi_dmul>
80014c4: 4602 mov r2, r0
80014c6: 460b mov r3, r1
80014c8: 4620 mov r0, r4
80014ca: 4629 mov r1, r5
80014cc: f02c ff78 bl 802e3c0 <__aeabi_dsub>
80014d0: 4632 mov r2, r6
80014d2: 4682 mov sl, r0
80014d4: 468b mov fp, r1
80014d6: 463b mov r3, r7
80014d8: 4640 mov r0, r8
80014da: 4649 mov r1, r9
80014dc: f02d f928 bl 802e730 <__aeabi_dmul>
80014e0: ec53 2b19 vmov r2, r3, d9
80014e4: 4604 mov r4, r0
80014e6: 460d mov r5, r1
80014e8: ec51 0b18 vmov r0, r1, d8
80014ec: f02d f920 bl 802e730 <__aeabi_dmul>
80014f0: 4602 mov r2, r0
80014f2: 460b mov r3, r1
80014f4: 4620 mov r0, r4
80014f6: 4629 mov r1, r5
80014f8: f02c ff64 bl 802e3c4 <__adddf3>
80014fc: 4604 mov r4, r0
80014fe: 460d mov r5, r1
8001500: 4650 mov r0, sl
8001502: 4659 mov r1, fp
8001504: 4622 mov r2, r4
8001506: 462b mov r3, r5
8001508: f02d fbac bl 802ec64 <__aeabi_dcmpun>
800150c: b990 cbnz r0, 8001534 <nco_crcf_mix_down+0xd4>
800150e: 4659 mov r1, fp
8001510: 4650 mov r0, sl
8001512: f02d fc05 bl 802ed20 <__aeabi_d2f>
8001516: 4603 mov r3, r0
8001518: 4620 mov r0, r4
800151a: 9c01 ldr r4, [sp, #4]
800151c: 4629 mov r1, r5
800151e: 6023 str r3, [r4, #0]
8001520: f02d fbfe bl 802ed20 <__aeabi_d2f>
8001524: 4603 mov r3, r0
8001526: 2000 movs r0, #0
8001528: 6063 str r3, [r4, #4]
800152a: b005 add sp, #20
800152c: ecbd 8b04 vpop {d8-d9}
8001530: e8bd 8ff0 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
8001534: ec47 6b13 vmov d3, r6, r7
8001538: eeb0 2a49 vmov.f32 s4, s18
800153c: eef0 2a69 vmov.f32 s5, s19
8001540: ec49 8b10 vmov d0, r8, r9
8001544: eeb0 1a48 vmov.f32 s2, s16
8001548: eef0 1a68 vmov.f32 s3, s17
800154c: f02d fd78 bl 802f040 <__muldc3>
8001550: ec5b ab10 vmov sl, fp, d0
8001554: ec55 4b11 vmov r4, r5, d1
8001558: e7d9 b.n 800150e <nco_crcf_mix_down+0xae>
As a test, replacing conj
with conjf
completely removes any references to double precision floating point functions
080013c4 <nco_crcf_mix_down>:
80013c4: b510 push {r4, lr}
80013c6: ed2d 8b02 vpush {d8}
80013ca: b082 sub sp, #8
80013cc: 460c mov r4, r1
80013ce: 4669 mov r1, sp
80013d0: eeb0 8a40 vmov.f32 s16, s0
80013d4: eef0 8a60 vmov.f32 s17, s1
80013d8: f7ff ff8a bl 80012f0 <nco_crcf_cexpf>
80013dc: eddd 6a01 vldr s13, [sp, #4]
80013e0: ed9d 0a00 vldr s0, [sp]
80013e4: ee66 7ac8 vnmul.f32 s15, s13, s16
80013e8: ee26 7aa8 vmul.f32 s14, s13, s17
80013ec: eee8 7a80 vfma.f32 s15, s17, s0
80013f0: eea8 7a00 vfma.f32 s14, s16, s0
80013f4: eeb4 7a67 vcmp.f32 s14, s15
80013f8: eef1 fa10 vmrs APSR_nzcv, fpscr
80013fc: d608 bvs.n 8001410 <nco_crcf_mix_down+0x4c>
80013fe: 2000 movs r0, #0
8001400: ed84 7a00 vstr s14, [r4]
8001404: edc4 7a01 vstr s15, [r4, #4]
8001408: b002 add sp, #8
800140a: ecbd 8b02 vpop {d8}
800140e: bd10 pop {r4, pc}
8001410: eef1 6a66 vneg.f32 s13, s13
8001414: eef0 1a68 vmov.f32 s3, s17
8001418: eeb0 1a48 vmov.f32 s2, s16
800141c: eef0 0a66 vmov.f32 s1, s13
8001420: f02b fb5e bl 802cae0 <__mulsc3>
8001424: eeb0 7a40 vmov.f32 s14, s0
8001428: eef0 7a60 vmov.f32 s15, s1
800142c: e7e7 b.n 80013fe <nco_crcf_mix_down+0x3a>
Similarly explicitly casting M_PI
to float
by replacing all references to M_PI
with (float)M_PI
fixes the issue with nco_crcf_get_frequency
080010ac <nco_crcf_get_frequency>:
80010ac: f500 5080 add.w r0, r0, #4096 ; 0x1000
80010b0: ed90 0a02 vldr s0, [r0, #8]
80010b4: eddf 7a09 vldr s15, [pc, #36] ; 80010dc <nco_crcf_get_frequency+0x30>
80010b8: eddf 6a09 vldr s13, [pc, #36] ; 80010e0 <nco_crcf_get_frequency+0x34>
80010bc: ed9f 7a09 vldr s14, [pc, #36] ; 80010e4 <nco_crcf_get_frequency+0x38>
80010c0: eeb8 0a40 vcvt.f32.u32 s0, s0
80010c4: ee20 0a27 vmul.f32 s0, s0, s15
80010c8: ee20 0a26 vmul.f32 s0, s0, s13
80010cc: eeb4 0ac7 vcmpe.f32 s0, s14
80010d0: eef1 fa10 vmrs APSR_nzcv, fpscr
80010d4: bfc8 it gt
80010d6: ee30 0a67 vsubgt.f32 s0, s0, s15
80010da: 4770 bx lr
80010dc: 40c90fdb .word 0x40c90fdb
80010e0: 2f800000 .word 0x2f800000
80010e4: 40490fdb .word 0x40490fdb
Oh, I didn't take it as rude at all! You're right in that M_PI
is interpreted as a double (difference between 3.14159
and 3.14159f
) because it's a preprocessor macro in the math.h
header.
Good catch with conj
vs conjf
as well. I'll submit a patch and see if that fixes things.
Made a few changes to the nco
module. They're temporary until I can figure out a better way to manage types and literals, but hopefully it should work for you.
C++ templates ?
@jgaeddert thanks for the temporary changes. I think I found another one in ampmodem.c, should probably be cargf
.
Seems like fixing this throughout the codebase might be quite a pain, it's unfortunate that gcc
can't infer the correct return type like in more modern languages like rust
and nim
.
https://github.com/jgaeddert/liquid-dsp/blob/3d9de34350c7ce3ba130672a1646cf92a927e81d/src/modem/src/ampmodem.c#L301
Fixed! Thanks!
I will also mention there is a branch of liquid that includes fixed-point algorithms; however the branch is a bit stale and a dry-run of merging master into it revealed a lot of conflicts. While it would take quite some time to get it up to speed again, it should work much faster on embedded systems, particularly those without floating-point accelerators. Keeping track of bit precision can be a pain, of course, but if low processing footprint is your goal, fixed-point is the way to go.
I am trying to use liquid-dsp in an embedded system with an ARM Cortex-M4F microcontroller which has hardware floating point acceleration, albeit single precision only (i.e.
float
notdouble
).Unfortunately when using the
ampmodem
the produced binary includes a lot of calls to software double-precision floating point library functions like__aeabi_dmul
, which are very slow on this particular system and means that I can't use it in real time.Is there a way to use single precision floating point throughout the library to make use of the available hardware acceleration?