ampmodem requires double precision floating point

knightshrub commented 3 years ago

I am trying to use liquid-dsp in an embedded system with an ARM Cortex-M4F microcontroller which has hardware floating point acceleration, albeit single precision only (i.e. float not double).

Unfortunately when using the ampmodem the produced binary includes a lot of calls to software double-precision floating point library functions like __aeabi_dmul, which are very slow on this particular system and means that I can't use it in real time.

Is there a way to use single precision floating point throughout the library to make use of the available hardware acceleration?

fsheikh commented 3 years ago

Isn't this a matter of correctly choosing (based on the ARM core under usage) mfpu option in the configure script . At least, this page lists a couple of compiler options for generating single precision float point numbers

knightshrub commented 3 years ago

No, this only enables hardware acceleration. However when there are double-precision operations in the code, the compiler will not simply substitute float for double.

Using the compiler option -Wdouble-promotion will throw a lot of warnings for floating point values being promoted to double in expressions where the other operand is a double or because the compiler treats constants like 1.0 as a double. The compiler option -fsingle-precision-constant changes this behavior to interpret 1.0 the same as 1.0f (both floats), but this doesn't seem to fix the issue in case of liquid-dsp. There is nothing to be done about it if liquid-dsp makes extensive use of double in the code.

In that case I wouldn't call the library suitable for embedded-system though, unless embedded system refers to Linux SBCs like the Raspberry Pi.

jgaeddert commented 3 years ago

@knightshrub this is odd as I try to avoid double-precision operations wherever possible. There are only a few cases where double-precision is needed, but I can't imagine anything to do with the ampmodem would need double precision.

knightshrub commented 3 years ago

@jgaeddert thanks for the reply, I realize my previous comment might have come off as a little rude, I certainly didn't mean it that way.

First of all, this is how I compile libliquid:

# remove any leftovers
make distclean
./bootstrap.sh
# this might very well be the culprit, I'm not super familiar with autoconf
# I also had to comment out AC_FUNC_MALLOC and AC_FUNC_REALLOC in configure.ac
CFLAGS='-mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -O3 -fsingle-precision-constant -Wdouble-promotion' LDFLAGS='-specs=nosys.specs' ./configure --host=arm-none-eabi
make

It appears that all the references to __aeabi_d.* are introduced by the NCO module. Does the NCO code really require double precision arithmetic? As an example let's have a look at nco_crcf_get_freq:

080010cc <nco_crcf_get_frequency>:
 80010cc:   f500 5080   add.w   r0, r0, #4096   ; 0x1000
 80010d0:   b538        push    {r3, r4, r5, lr}
 80010d2:   6880        ldr r0, [r0, #8]
 80010d4:   ee07 0a90   vmov    s15, r0
 80010d8:   eef8 7a67   vcvt.f32.u32    s15, s15
 80010dc:   ed2d 8b02   vpush   {d8}
 80010e0:   ee17 0a90   vmov    r0, s15
 80010e4:   f02d facc   bl  802e680 <__aeabi_f2d>
 80010e8:   a315        add r3, pc, #84 ; (adr r3, 8001140 <nco_crcf_get_frequency+0x74>)
 80010ea:   e9d3 2300   ldrd    r2, r3, [r3]
 80010ee:   f02d fb1f   bl  802e730 <__aeabi_dmul>
 80010f2:   4b15        ldr r3, [pc, #84]   ; (8001148 <nco_crcf_get_frequency+0x7c>)
 80010f4:   2200        movs    r2, #0
 80010f6:   f02d fb1b   bl  802e730 <__aeabi_dmul>
 80010fa:   f02d fe11   bl  802ed20 <__aeabi_d2f>
 80010fe:   ee08 0a10   vmov    s16, r0
 8001102:   f02d fabd   bl  802e680 <__aeabi_f2d>
 8001106:   a30c        add r3, pc, #48 ; (adr r3, 8001138 <nco_crcf_get_frequency+0x6c>)
 8001108:   e9d3 2300   ldrd    r2, r3, [r3]
 800110c:   4604        mov r4, r0
 800110e:   460d        mov r5, r1
 8001110:   f02d fd9e   bl  802ec50 <__aeabi_dcmpgt>
 8001114:   b150        cbz r0, 800112c <nco_crcf_get_frequency+0x60>
 8001116:   a30a        add r3, pc, #40 ; (adr r3, 8001140 <nco_crcf_get_frequency+0x74>)
 8001118:   e9d3 2300   ldrd    r2, r3, [r3]
 800111c:   4620        mov r0, r4
 800111e:   4629        mov r1, r5
 8001120:   f02d f94e   bl  802e3c0 <__aeabi_dsub>
 8001124:   f02d fdfc   bl  802ed20 <__aeabi_d2f>
 8001128:   ee08 0a10   vmov    s16, r0
 800112c:   eeb0 0a48   vmov.f32    s0, s16
 8001130:   ecbd 8b02   vpop    {d8}
 8001134:   bd38        pop {r3, r4, r5, pc}
 8001136:   bf00        nop
 8001138:   54442d18    .word   0x54442d18
 800113c:   400921fb    .word   0x400921fb
 8001140:   54442d18    .word   0x54442d18
 8001144:   401921fb    .word   0x401921fb
 8001148:   3df00000    .word   0x3df00000

Looking at the source https://github.com/jgaeddert/liquid-dsp/blob/b9de495b0d529977914a4bbcd1295208e5de07c0/src/nco/src/nco.c#L167-L172 reveals that the code uses M_PI which according to the GNU libm documentation is a double. This probably leads to the entire expression being promoted to double precision arithmetic, although the the -Wdouble-promotion flag doesn't seem to complain about it.

A piece of code that gcc with -Wdouble-promotion does complain about is this:

src/nco/src/nco.c: In function 'nco_crcf_mix_down':
src/nco/src/nco.c:283:14: warning: implicit conversion from 'complex float' to 'complex double' to match other operand of binary expression [-Wdouble-promotion]
    283 | *_y = _x * conj(v);

Again looking at the source https://github.com/jgaeddert/liquid-dsp/blob/b9de495b0d529977914a4bbcd1295208e5de07c0/src/nco/src/nco.c#L270-L285 reveals the use of conj, which according to the GNU libm documentation returns a double. This function should probably be conjf instead.

The disassembly accordingly shows many references to software double precision floating point functions:

08001460 <nco_crcf_mix_down>:
 8001460:   e92d 4ff0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
 8001464:   ed2d 8b04   vpush   {d8-d9}
 8001468:   ee10 5a10   vmov    r5, s0
 800146c:   b085        sub sp, #20
 800146e:   4688        mov r8, r1
 8001470:   a902        add r1, sp, #8
 8001472:   ee10 4a90   vmov    r4, s1
 8001476:   f8cd 8004   str.w   r8, [sp, #4]
 800147a:   f7ff ff87   bl  800138c <nco_crcf_cexpf>
 800147e:   4628        mov r0, r5
 8001480:   f02d f8fe   bl  802e680 <__aeabi_f2d>
 8001484:   4680        mov r8, r0
 8001486:   4620        mov r0, r4
 8001488:   4689        mov r9, r1
 800148a:   f02d f8f9   bl  802e680 <__aeabi_f2d>
 800148e:   ec41 0b18   vmov    d8, r0, r1
 8001492:   9802        ldr r0, [sp, #8]
 8001494:   f02d f8f4   bl  802e680 <__aeabi_f2d>
 8001498:   ec41 0b19   vmov    d9, r0, r1
 800149c:   9803        ldr r0, [sp, #12]
 800149e:   f02d f8ef   bl  802e680 <__aeabi_f2d>
 80014a2:   ec53 2b19   vmov    r2, r3, d9
 80014a6:   4606        mov r6, r0
 80014a8:   f101 4700   add.w   r7, r1, #2147483648 ; 0x80000000
 80014ac:   4640        mov r0, r8
 80014ae:   4649        mov r1, r9
 80014b0:   f02d f93e   bl  802e730 <__aeabi_dmul>
 80014b4:   4632        mov r2, r6
 80014b6:   4604        mov r4, r0
 80014b8:   460d        mov r5, r1
 80014ba:   463b        mov r3, r7
 80014bc:   ec51 0b18   vmov    r0, r1, d8
 80014c0:   f02d f936   bl  802e730 <__aeabi_dmul>
 80014c4:   4602        mov r2, r0
 80014c6:   460b        mov r3, r1
 80014c8:   4620        mov r0, r4
 80014ca:   4629        mov r1, r5
 80014cc:   f02c ff78   bl  802e3c0 <__aeabi_dsub>
 80014d0:   4632        mov r2, r6
 80014d2:   4682        mov sl, r0
 80014d4:   468b        mov fp, r1
 80014d6:   463b        mov r3, r7
 80014d8:   4640        mov r0, r8
 80014da:   4649        mov r1, r9
 80014dc:   f02d f928   bl  802e730 <__aeabi_dmul>
 80014e0:   ec53 2b19   vmov    r2, r3, d9
 80014e4:   4604        mov r4, r0
 80014e6:   460d        mov r5, r1
 80014e8:   ec51 0b18   vmov    r0, r1, d8
 80014ec:   f02d f920   bl  802e730 <__aeabi_dmul>
 80014f0:   4602        mov r2, r0
 80014f2:   460b        mov r3, r1
 80014f4:   4620        mov r0, r4
 80014f6:   4629        mov r1, r5
 80014f8:   f02c ff64   bl  802e3c4 <__adddf3>
 80014fc:   4604        mov r4, r0
 80014fe:   460d        mov r5, r1
 8001500:   4650        mov r0, sl
 8001502:   4659        mov r1, fp
 8001504:   4622        mov r2, r4
 8001506:   462b        mov r3, r5
 8001508:   f02d fbac   bl  802ec64 <__aeabi_dcmpun>
 800150c:   b990        cbnz    r0, 8001534 <nco_crcf_mix_down+0xd4>
 800150e:   4659        mov r1, fp
 8001510:   4650        mov r0, sl
 8001512:   f02d fc05   bl  802ed20 <__aeabi_d2f>
 8001516:   4603        mov r3, r0
 8001518:   4620        mov r0, r4
 800151a:   9c01        ldr r4, [sp, #4]
 800151c:   4629        mov r1, r5
 800151e:   6023        str r3, [r4, #0]
 8001520:   f02d fbfe   bl  802ed20 <__aeabi_d2f>
 8001524:   4603        mov r3, r0
 8001526:   2000        movs    r0, #0
 8001528:   6063        str r3, [r4, #4]
 800152a:   b005        add sp, #20
 800152c:   ecbd 8b04   vpop    {d8-d9}
 8001530:   e8bd 8ff0   ldmia.w sp!, {r4, r5, r6, r7, r8, r9, sl, fp, pc}
 8001534:   ec47 6b13   vmov    d3, r6, r7
 8001538:   eeb0 2a49   vmov.f32    s4, s18
 800153c:   eef0 2a69   vmov.f32    s5, s19
 8001540:   ec49 8b10   vmov    d0, r8, r9
 8001544:   eeb0 1a48   vmov.f32    s2, s16
 8001548:   eef0 1a68   vmov.f32    s3, s17
 800154c:   f02d fd78   bl  802f040 <__muldc3>
 8001550:   ec5b ab10   vmov    sl, fp, d0
 8001554:   ec55 4b11   vmov    r4, r5, d1
 8001558:   e7d9        b.n 800150e <nco_crcf_mix_down+0xae>

As a test, replacing conj with conjf completely removes any references to double precision floating point functions

080013c4 <nco_crcf_mix_down>:
 80013c4:   b510        push    {r4, lr}
 80013c6:   ed2d 8b02   vpush   {d8}
 80013ca:   b082        sub sp, #8
 80013cc:   460c        mov r4, r1
 80013ce:   4669        mov r1, sp
 80013d0:   eeb0 8a40   vmov.f32    s16, s0
 80013d4:   eef0 8a60   vmov.f32    s17, s1
 80013d8:   f7ff ff8a   bl  80012f0 <nco_crcf_cexpf>
 80013dc:   eddd 6a01   vldr    s13, [sp, #4]
 80013e0:   ed9d 0a00   vldr    s0, [sp]
 80013e4:   ee66 7ac8   vnmul.f32   s15, s13, s16
 80013e8:   ee26 7aa8   vmul.f32    s14, s13, s17
 80013ec:   eee8 7a80   vfma.f32    s15, s17, s0
 80013f0:   eea8 7a00   vfma.f32    s14, s16, s0
 80013f4:   eeb4 7a67   vcmp.f32    s14, s15
 80013f8:   eef1 fa10   vmrs    APSR_nzcv, fpscr
 80013fc:   d608        bvs.n   8001410 <nco_crcf_mix_down+0x4c>
 80013fe:   2000        movs    r0, #0
 8001400:   ed84 7a00   vstr    s14, [r4]
 8001404:   edc4 7a01   vstr    s15, [r4, #4]
 8001408:   b002        add sp, #8
 800140a:   ecbd 8b02   vpop    {d8}
 800140e:   bd10        pop {r4, pc}
 8001410:   eef1 6a66   vneg.f32    s13, s13
 8001414:   eef0 1a68   vmov.f32    s3, s17
 8001418:   eeb0 1a48   vmov.f32    s2, s16
 800141c:   eef0 0a66   vmov.f32    s1, s13
 8001420:   f02b fb5e   bl  802cae0 <__mulsc3>
 8001424:   eeb0 7a40   vmov.f32    s14, s0
 8001428:   eef0 7a60   vmov.f32    s15, s1
 800142c:   e7e7        b.n 80013fe <nco_crcf_mix_down+0x3a>

Similarly explicitly casting M_PI to float by replacing all references to M_PI with (float)M_PI fixes the issue with nco_crcf_get_frequency

080010ac <nco_crcf_get_frequency>:
 80010ac:   f500 5080   add.w   r0, r0, #4096   ; 0x1000
 80010b0:   ed90 0a02   vldr    s0, [r0, #8]
 80010b4:   eddf 7a09   vldr    s15, [pc, #36]  ; 80010dc <nco_crcf_get_frequency+0x30>
 80010b8:   eddf 6a09   vldr    s13, [pc, #36]  ; 80010e0 <nco_crcf_get_frequency+0x34>
 80010bc:   ed9f 7a09   vldr    s14, [pc, #36]  ; 80010e4 <nco_crcf_get_frequency+0x38>
 80010c0:   eeb8 0a40   vcvt.f32.u32    s0, s0
 80010c4:   ee20 0a27   vmul.f32    s0, s0, s15
 80010c8:   ee20 0a26   vmul.f32    s0, s0, s13
 80010cc:   eeb4 0ac7   vcmpe.f32   s0, s14
 80010d0:   eef1 fa10   vmrs    APSR_nzcv, fpscr
 80010d4:   bfc8        it  gt
 80010d6:   ee30 0a67   vsubgt.f32  s0, s0, s15
 80010da:   4770        bx  lr
 80010dc:   40c90fdb    .word   0x40c90fdb
 80010e0:   2f800000    .word   0x2f800000
 80010e4:   40490fdb    .word   0x40490fdb

jgaeddert commented 3 years ago

Oh, I didn't take it as rude at all! You're right in that M_PI is interpreted as a double (difference between 3.14159 and 3.14159f) because it's a preprocessor macro in the math.h header.

Good catch with conj vs conjf as well. I'll submit a patch and see if that fixes things.

jgaeddert commented 3 years ago

Made a few changes to the nco module. They're temporary until I can figure out a better way to manage types and literals, but hopefully it should work for you.

pfeatherstone commented 3 years ago

C++ templates ?

knightshrub commented 3 years ago

@jgaeddert thanks for the temporary changes. I think I found another one in ampmodem.c, should probably be cargf. Seems like fixing this throughout the codebase might be quite a pain, it's unfortunate that gcc can't infer the correct return type like in more modern languages like rust and nim. https://github.com/jgaeddert/liquid-dsp/blob/3d9de34350c7ce3ba130672a1646cf92a927e81d/src/modem/src/ampmodem.c#L301

jgaeddert commented 3 years ago

Fixed! Thanks!

jgaeddert commented 3 years ago

I will also mention there is a branch of liquid that includes fixed-point algorithms; however the branch is a bit stale and a dry-run of merging master into it revealed a lot of conflicts. While it would take quite some time to get it up to speed again, it should work much faster on embedded systems, particularly those without floating-point accelerators. Keeping track of bit precision can be a pain, of course, but if low processing footprint is your goal, fixed-point is the way to go.

jgaeddert / liquid-dsp

ampmodem requires double precision floating point #253