prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.02k stars 1.05k forks source link

optimisation: remove 4 calls to `__divsf3` #4587

Closed gudnimg closed 5 months ago

gudnimg commented 8 months ago

Found some redundant divisions in the assembly.

In mc_arc call, changing feedrate * feedmultiply / 60 / 100.0 to (feedrate * feedmultiply) / 6000.f. One call to __divsf3 is removed.

Before After
```asm 139f2: 60 91 39 02 lds r22, 0x0239 ; 0x800239 139f6: 70 91 3a 02 lds r23, 0x023A ; 0x80023a 139fa: 07 2e mov r0, r23 139fc: 00 0c add r0, r0 139fe: 88 0b sbc r24, r24 13a00: 99 0b sbc r25, r25 13a02: 0f 94 81 d6 call 0x3ad02 ; 0x3ad02 <__floatsisf> 13a06: 20 91 90 02 lds r18, 0x0290 ; 0x800290 13a0a: 30 91 91 02 lds r19, 0x0291 ; 0x800291 13a0e: 40 91 92 02 lds r20, 0x0292 ; 0x800292 13a12: 50 91 93 02 lds r21, 0x0293 ; 0x800293 13a16: 0f 94 c0 d6 call 0x3ad80 ; 0x3ad80 <__mulsf3> 13a1a: 20 e0 ldi r18, 0x00 ; 0 13a1c: 30 e0 ldi r19, 0x00 ; 0 13a1e: 40 e7 ldi r20, 0x70 ; 112 13a20: 52 e4 ldi r21, 0x42 ; 66 13a22: 0f 94 1a de call 0x3bc34 ; 0x3bc34 <__divsf3> 13a26: 20 e0 ldi r18, 0x00 ; 0 13a28: 30 e0 ldi r19, 0x00 ; 0 13a2a: 48 ec ldi r20, 0xC8 ; 200 13a2c: 52 e4 ldi r21, 0x42 ; 66 13a2e: 0f 94 1a de call 0x3bc34 ; 0x3bc34 <__divsf3> 13a32: aa 96 adiw r28, 0x2a ; 42 13a34: 6c af std Y+60, r22 ; 0x3c 13a36: 7d af std Y+61, r23 ; 0x3d 13a38: 8e af std Y+62, r24 ; 0x3e 13a3a: 9f af std Y+63, r25 ; 0x3f 13a3c: aa 97 sbiw r28, 0x2a ; 42 ``` ```asm 139d0: 60 91 39 02 lds r22, 0x0239 ; 0x800239 139d4: 70 91 3a 02 lds r23, 0x023A ; 0x80023a 139d8: 07 2e mov r0, r23 139da: 00 0c add r0, r0 139dc: 88 0b sbc r24, r24 139de: 99 0b sbc r25, r25 139e0: 0f 94 5b d6 call 0x3acb6 ; 0x3acb6 <__floatsisf> 139e4: 20 91 90 02 lds r18, 0x0290 ; 0x800290 139e8: 30 91 91 02 lds r19, 0x0291 ; 0x800291 139ec: 40 91 92 02 lds r20, 0x0292 ; 0x800292 139f0: 50 91 93 02 lds r21, 0x0293 ; 0x800293 139f4: 0f 94 9a d6 call 0x3ad34 ; 0x3ad34 <__mulsf3> 139f8: 20 e0 ldi r18, 0x00 ; 0 139fa: 30 e8 ldi r19, 0x80 ; 128 139fc: 4b eb ldi r20, 0xBB ; 187 139fe: 55 e4 ldi r21, 0x45 ; 69 13a00: 0f 94 f4 dd call 0x3bbe8 ; 0x3bbe8 <__divsf3> 13a04: aa 96 adiw r28, 0x2a ; 42 13a06: 6c af std Y+60, r22 ; 0x3c 13a08: 7d af std Y+61, r23 ; 0x3d 13a0a: 8e af std Y+62, r24 ; 0x3e 13a0c: 9f af std Y+63, r25 ; 0x3f 13a0e: aa 97 sbiw r28, 0x2a ; 42 ```

Changing how mm_per_arc_segment_sec is calculated from (feed_rate / 60.0f) * (1.0f / cs.arc_segments_per_sec) to feed_rate / (60.f * float(cs.arc_segments_per_sec)) removes two calls to __divsf3

Before After
```asm 13c80: 90 e0 ldi r25, 0x00 ; 0 13c82: 80 e0 ldi r24, 0x00 ; 0 13c84: 0f 94 7f d6 call 0x3acfe ; 0x3acfe <__floatunsisf> 13c88: 9b 01 movw r18, r22 13c8a: ac 01 movw r20, r24 13c8c: 60 e0 ldi r22, 0x00 ; 0 13c8e: 70 e0 ldi r23, 0x00 ; 0 13c90: 80 e8 ldi r24, 0x80 ; 128 13c92: 9f e3 ldi r25, 0x3F ; 63 13c94: 0f 94 1a de call 0x3bc34 ; 0x3bc34 <__divsf3> 13c98: 2b 01 movw r4, r22 13c9a: 3c 01 movw r6, r24 13c9c: 20 e0 ldi r18, 0x00 ; 0 13c9e: 30 e0 ldi r19, 0x00 ; 0 13ca0: 40 e7 ldi r20, 0x70 ; 112 13ca2: 52 e4 ldi r21, 0x42 ; 66 13ca4: aa 96 adiw r28, 0x2a ; 42 13ca6: 6c ad ldd r22, Y+60 ; 0x3c 13ca8: 7d ad ldd r23, Y+61 ; 0x3d 13caa: 8e ad ldd r24, Y+62 ; 0x3e 13cac: 9f ad ldd r25, Y+63 ; 0x3f 13cae: aa 97 sbiw r28, 0x2a ; 42 13cb0: 0f 94 1a de call 0x3bc34 ; 0x3bc34 <__divsf3> 13cb4: 9b 01 movw r18, r22 13cb6: ac 01 movw r20, r24 13cb8: c3 01 movw r24, r6 13cba: b2 01 movw r22, r4 13cbc: 0f 94 c0 d6 call 0x3ad80 ; 0x3ad80 <__mulsf3> 13cc0: 3b 01 movw r6, r22 13cc2: 4c 01 movw r8, r24 ``` ```asm 13c52: 90 e0 ldi r25, 0x00 ; 0 13c54: 80 e0 ldi r24, 0x00 ; 0 13c56: 0f 94 59 d6 call 0x3acb2 ; 0x3acb2 <__floatunsisf> 13c5a: 20 e0 ldi r18, 0x00 ; 0 13c5c: 30 e0 ldi r19, 0x00 ; 0 13c5e: 40 e7 ldi r20, 0x70 ; 112 13c60: 52 e4 ldi r21, 0x42 ; 66 13c62: 0f 94 9a d6 call 0x3ad34 ; 0x3ad34 <__mulsf3> 13c66: 9b 01 movw r18, r22 13c68: ac 01 movw r20, r24 13c6a: aa 96 adiw r28, 0x2a ; 42 13c6c: 6c ad ldd r22, Y+60 ; 0x3c 13c6e: 7d ad ldd r23, Y+61 ; 0x3d 13c70: 8e ad ldd r24, Y+62 ; 0x3e 13c72: 9f ad ldd r25, Y+63 ; 0x3f 13c74: aa 97 sbiw r28, 0x2a ; 42 13c76: 0f 94 f4 dd call 0x3bbe8 ; 0x3bbe8 <__divsf3> 13c7a: 3b 01 movw r6, r22 13c7c: 4c 01 movw r8, r24 ```

Change in memory: Flash: -34 bytes SRAM: 0 bytes

github-actions[bot] commented 8 months ago

All values in bytes. Δ Delta to base

Target ΔFlash ΔSRAM Used Flash Used SRAM Free Flash Free SRAM
MK3S_MULTILANG -34 0 247750 5653 6202 2539
MK3_MULTILANG -38 0 247048 5662 6904 2530
gudnimg commented 5 months ago

I believe this PR is ready now, I'll review the assembly files a bit before removing the 🚧 in the title

For reference, here are the ASM files for latest MK3 branch, and this PR's branch: ASM_files.zip

gudnimg commented 5 months ago

I removed two optimisation since there is some risk it would lower resolution in calculations. I don't want to risk that.

However, I've kept two other optimisations which only remove redundant divisions. Those changes should be safe.

gudnimg commented 5 months ago

Rebased to get the memory delta from the actions correct again :)