SSE instructions. - Githubissues

FakelsHub commented 3 years ago

https://github.com/phobos2077/sfall/blob/6c3dd72ea757d047c7f84002246de039b80f31a8/sfall/ddraw.vcxproj#L39-L44 It seems that it is useless for you to set this parameter <CharacterSet>NotSet</CharacterSet> , the compiler still uses SSE commands.

here is an example of recent code from your sfall using SSE2 cvttsd2si

01468620 sub_1468620 proc near
01468620 fld     dbl_14EFC38
01468626 fld     dbl_14C3580
0146862C fmul    st, st(1)
0146862E call    near ptr sub_1454510
01468633 fmul    dbl_14C3590
01468639 mov     ds:_GNW95_repeat_rate, eax
0146863E call    near ptr sub_1454510
01468643 mov     ds:_GNW95_repeat_delay, eax
01468648 mov     byte_14CAA77, 0
0146864F retn
0146864F sub_1468620 endp

01454510 sub_1454510 proc near 
01454510 var_24= dword ptr -24h
01454510 var_14= qword ptr -14h
01454510 var_C= qword ptr -0Ch
01454510
01454510 cmp     dword_14CF360, 0
01454517 jz      short loc_1454550
01454519 push    ebp
0145451A mov     ebp, esp
0145451C sub     esp, 8
0145451F and     esp, 0FFFFFFF8h
01454522 fstp    [esp+0Ch+var_C]
01454525 cvttsd2si eax, [esp+0Ch+var_C]
0145452A leave
0145452B retn

FakelsHub commented 3 years ago

The same code with SSE (without the old FPU)

01542DD0 sub_1542DD0 proc near
01542DD0 movsd   xmm1, qword_15E3E80
01542DD8 movaps  xmm0, xmm1
01542DDB mov     byte_15ADAC1, 0
01542DE2 mulsd   xmm0, qword_15A1F78
01542DEA mulsd   xmm1, qword_15A1F88
01542DF2 cvttsd2si eax, xmm0
01542DF6 mov     ds:_GNW95_repeat_rate, eax
01542DFB cvttsd2si eax, xmm1
01542DFF mov     ds:_GNW95_repeat_delay, eax
01542E04 retn
01542E04 sub_1542DD0 endp

FakelsHub commented 3 years ago

It looks like you need to use float instead of double for the compiler to use CVTTSS2SI | Scalar conversion by truncating Float to signed DWord (MMX) CVTTSD2SI | Converting a Scalar Double to Float by truncation

NovaRain commented 3 years ago

What's the function/feature of that section of code using SSE2 instruction?

FakelsHub commented 3 years ago

I didn't understand you, but I suspect you wanted to know what part of the code relates to these functions? Any double->int conversion

NovaRain commented 3 years ago

I didn't understand you, but I suspect you wanted to know what part of the code relates to these functions? Any double->int conversion

Yes, that's what I mean. I haven't tested SpeedPatch on my old servers yet. I'll check them later.

There are about 35 cases of using double for variables in the code, I don't know if it'd be safer to change them to float as the extra decimal precision isn't really necessary in most cases IMO.

NovaRain commented 3 years ago

OK, I tried the current 4.3.1 build on my PII potato with Win2000 (I use some system DLL hacks to make sfall 4.x work), enabling the speed patch doesn't crash the game or something unusual, but because the game itself runs slow enough in HRP 4.1.8 windowed mode (640x480 size), setting 300% speed is barely noticeable in game (NPCs play their idle animation more frequently).

I think the cmp dword_14CF360, 0 is about checking CPU model/features or something, and jumps to corresponding double->int conversion code. There are some other cases on dword_14CF360 in the code of phobos build.

FakelsHub commented 3 years ago

I think the cmp dword_14CF360, 0 is about checking CPU model/features or something

This is unknown. I have this set to 1. Against this assumption is the use of fstp, why use FPU if there is SSE.

NovaRain commented 3 years ago

Just curious, does changing the datatype of variables in SpeedPatch.cpp from double to float help the case? I still see the same ASM code when using dumpbin /disasm to disassemble the binary. There are still some modules use 'double' datatype variables that might have the same double->int conversion: InputFuncs.cpp, Combat.cpp, Skills.cpp, Stats.cpp, WindowRender.cpp, Worldmap.cpp.

FakelsHub commented 3 years ago

for me, the instructions for speedpatch have changed from double to float.. CVTTSD2SI -> CVTTSS2SI

NovaRain commented 3 years ago

Oh, OK. I only check the code of the supposed double->int conversion. conv

I think the code of your 2nd comment is now this, using cvttss2si: conv2

Not sure if you don't use any double the first code would still be in the binary, or all of the lines that were calling double->int conversion would be replaced with cvttss2si.

FakelsHub commented 3 years ago

Found in internet :-)

013B1030  call        _ftol2_sse (13B19A0h)

013B19A0  cmp         dword ptr [___sse2_available (13B3378h)],0
013B19A7  je          _ftol2 (13B19D6h)
013B19A9  push        ebp
013B19AA  mov         ebp,esp
013B19AC  sub         esp,8
013B19AF  and         esp,0FFFFFFF8h
013B19B2  fstp        qword ptr [esp]
013B19B5  cvttsd2si   eax,mmword ptr [esp]
013B19BA  leave
013B19BB  ret

NovaRain commented 3 years ago

Oh, so the code is part of the generic ftol2 function, and the cmp dword is indeed a CPU feature (SSE2) check. At least it looks like I don't have to worry about existing features suddenly crashing on older systems.

I'll toy with the idea of replacing other 'double' type variables with float later.

FakelsHub commented 3 years ago

I'll toy with the idea of replacing other 'double' type variables with float later.

it doesn't make sense.

NovaRain commented 3 years ago

I'll toy with the idea of replacing other 'double' type variables with float later.

it doesn't make sense.

Maybe, but what's the difference between changing double type variables in SpeedPatch and InputFuncs or Worldmap? What's special about SpeedPatch? Because it's only one uses cvttsd2si for double->int conversion?

FakelsHub commented 3 years ago

I changed this to float, because the function is called very often, it is possible that sse float instructions work faster (but this is not a fact). For FPU, I do not know how this will affect, because without sse, there is a large amount of code to convert. Compare your code with double and float.

NovaRain commented 3 years ago

Compare your code with double and float.

OK, did some silly comparisons in your build:

WindowRender.cpp - fadeMulti:

mmword -> dword
A few sd instructions to ss (Scalar Double-Precision -> Scalar Single-Precision)
Doesn't matter much I guess.

Worldmap.cpp - Passed, because original FO1 also uses double type for tick calculation on the world map.

InputFuncs.cpp - mouse speed

cvttsd2si -> cvttss2si
Some other sd instructions to ss
Slightly less generated code from dumpbin.

Skills.cpp - multipliers

Similar to the change on WindowRender.cpp

Stats.cpp - StatFormula.multi[]

Similar to the change on WindowRender.cpp
Slightly more generated code from dumpbin.

Combat.cpp - KnockbackModifier.value

Because knockback script functions pass value in float type, I thought it might be reasonable to change others.
Generated code from dumpbin increase about 7 KB. On 3.8 it's the opposite, generated code reduce 13 KB. Probably due to how the compiler handle the optimization here.

Verdict: yep, changing double to float doesn't matter much, as majority of the rest don't get called as frequent as SpeedPatch, maybe except InputFuncs (mouse movement happens a lot in game obviously).

sfall-team / sfall

SSE instructions. #407