milkytracker / MilkyTracker

An FT2 compatible music tracker
http://milkytracker.github.io/
Other
1.69k stars 162 forks source link

Use previous implementation of fill_dword, minus asm #213

Closed exelotl closed 4 years ago

exelotl commented 4 years ago

The new implementation of fill_dword completely messed up the UI, because memset only fills a byte value, not a 32-bit value

The simplest implementation would be

while (len--)
    *(buff++) = dw;

But I went with the old implementation because I like it more, assuming that the compiler isn't gonna do some galaxy-brain optimisations to transform the above code into some even better bulk copy.

Deltafire commented 4 years ago

My bad, I should have spotted this!

Using your simplest implementation, GCC produces the following code:

fillword(unsigned long, int*, int):
        lea     r9, [rdi-1]
        mov     r8d, edx
        cmp     r9, 2
        jbe     .L2
        mov     rcx, rdi
        movd    xmm1, edx
        xor     eax, eax
        shr     rcx, 2
        pshufd  xmm0, xmm1, 0
.L3:
        mov     rdx, rax
        add     rax, 1
        sal     rdx, 4
        movups  XMMWORD PTR [rsi+rdx], xmm0
        cmp     rcx, rax
        jne     .L3
        mov     rax, rdi
        and     rax, -4
        lea     rsi, [rsi+rax*4]
        sub     r9, rax
        cmp     rdi, rax
        jne     .L2
.L5:
fillword(unsigned long, int*, int) [clone .cold]:
.L2:
        mov     DWORD PTR [rsi], r8d
        test    r9, r9
        je      .L5
        sub     r9, 1
        mov     DWORD PTR [rsi+4], r8d
        je      .L5
        mov     DWORD PTR [rsi+8], r8d
        jmp     .L5

I see no need for us to do anything clever here, we'll use the simple version and leave the trickery to the compiler :)

exelotl commented 4 years ago

Oh wow ok! Nice one GCC :)

Simplified version pushed

Deltafire commented 4 years ago

Merged.