x87 functions - Githubissues

Torinde commented 8 months ago

x87 (instructions list: current, obsolete Intel/IIT/Cyrix, obsolete NEC: part1, part2)

code for most of those is available in SoftFloat (Bochs, 86Box, QEMU, Berkeley)
useful even on x86, because x87 may get removed in a future x86 CPU (already deprecated in Windows, unsupported in MSVC)
POWER9, z, RISC-V Q/L support 128-bit precision, which may speedup emulating the 80-bit x87
relevant for DOS/Win9x/WinXP emulators (for games and other software)
- some of which are still used even for civil engineering calculations
- especially when run on non-x86 platforms, where precision is sometimes reduced to 64-bit

CPU flag in Linux is fpu.

x87 Non-Waiting FPU Control Instructions

[ ] FNINIT DB E3 FINIT Initialize x87 FPU
[ ] FLDCW m16 D9 /5 Load x87 Control Word
[ ] FNSTCW m16 D9 /7 FSTCW Store x87 Control Word
[ ] FNSTSW m16 DD /7 FSTSW Store x87 Status Word
[ ] FNCLEX DB E2 FCLEX Clear x87 Exception Flags
[ ] FLDENV m112/m224[c] D9 /4 Load x87 FPU Environment
[ ] FNSTENV m112/m224[c] D9 /6 FSTENV Store x87 FPU Environment
[ ] FNSAVE m752/m864[c] DD /6 FSAVE Save x87 FPU State, then initialize x87 FPU
[ ] FRSTOR m752/m864[c] DD /4 Restore x87 FPU State
[ ] FNENI DB E0 FENI Enable Interrupts (8087 only)[d]
[ ] FNDISI DB E1 FDISI Disable Interrupts (8087 only)[d]

x87 Floating-point Load/Store/Move Instructions

[ ] FLD m32 D9 /0 Load floating-point value onto stack
[ ] FLD m64 DD /0 Load floating-point value onto stack
[ ] FLD m80 DB /5 Load floating-point value onto stack
[ ] FLD st(i) D9 C0+i Load floating-point value onto stack
[ ] FST m32 D9 /2 Store top-of-stack floating-point value to memory or stack register
[ ] FST m64 DD /2 Store top-of-stack floating-point value to memory or stack register
[ ] FST st(i)[e] DD D0+i Store top-of-stack floating-point value to memory or stack register
[ ] FSTP m32 D9 /3 Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FSTP m64 DD /3 Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FSTP m80[e] DB /7 Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FSTP st(i)[e][f] DD D8+i Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FSTP st(i)[e][f] DF D0+i[g] Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FSTP st(i)[e][f] DF D8+i[g] Store top-of-stack floating-point value to memory or stack register, then pop
[ ] FLDZ D9 EE Push +0.0 onto stack
[ ] FLD1 D9 E8 Push +1.0 onto stack
[ ] FLDPI D9 EB Push π (approximately 3.14159) onto stack
[ ] FLDL2T D9 E9 Push log2⁡(10) (approximately 3.32193) onto stack
[ ] FLDL2E D9 EA Push log2⁡(e) (approximately 1.44269) onto stack
[ ] FLDLG2 D9 EC Push log10⁡(2) (approximately 0.30103) onto stack
[ ] FLDLN2 D9 ED Push ln⁡(2) (approximately 0.69315) onto stack
[ ] FXCH st(i)[i][j] D9 C8+i Exchange top-of-stack register with other stack register
[ ] FXCH st(i)[i][j] DD C8+i[g] Exchange top-of-stack register with other stack register
[ ] FXCH st(i)[i][j] DF C8+i[g Exchange top-of-stack register with other stack register

x87 Integer Load/Store Instructions

[ ] FILD m16 DF /0 Load signed integer value onto stack from memory, with conversion to floating-point
[ ] FILD m32 DB /0 Load signed integer value onto stack from memory, with conversion to floating-point
[ ] FILD m64 DF /5 Load signed integer value onto stack from memory, with conversion to floating-point
[ ] FIST m16 DF /2 Store top-of-stack value to memory, with conversion to signed integer
[ ] FIST m32 DB /2 Store top-of-stack value to memory, with conversion to signed integer
[ ] FISTP m16 DF /3 Store top-of-stack value to memory, with conversion to signed integer, then pop stack
[ ] FISTP m32 DB /3 Store top-of-stack value to memory, with conversion to signed integer, then pop stack
[ ] FISTP m64 DF /7 Store top-of-stack value to memory, with conversion to signed integer, then pop stack
[ ] FBLD m80[k] DF /4 Load 18-digit Binary-Coded-Decimal integer value onto stack from memory, with conversion to floating-point
[ ] FBSTP m80 DF /6 Store top-of-stack value to memory, with conversion to 18-digit Binary-Coded-Decimal integer, then pop stack

x87 Basic Arithmetic Instructions

[ ] FADD m32 D8 /0 Floating-point add dst <- dst + src
[ ] FADD m64 DC /0 Floating-point add dst <- dst + src
[ ] FADD st,st(i) D8 C0+i Floating-point add dst <- dst + src
[ ] FADD st(i),st DC C0+i Floating-point add dst <- dst + src
[ ] FMUL m32 D8 /1 Floating-point multiply dst <- dst * src
[ ] FMUL m64 DC /1 Floating-point multiply dst <- dst * src
[ ] FMUL st,st(i) D8 C8+i Floating-point multiply dst <- dst * src
[ ] FMUL st(i),st DC C8+i Floating-point multiply dst <- dst * src
[ ] FSUB m32 D8 /4 Floating-point subtract dst <- dst – src
[ ] FSUB m64 DC /4 Floating-point subtract dst <- dst – src
[ ] FSUB st,st(i) D8 E0+i Floating-point subtract dst <- dst – src
[ ] FSUB st(i),st DC E8+i Floating-point subtract dst <- dst – src
[ ] FSUBR m32 D8 /5 Floating-point reverse subtract dst <- src – dst
[ ] FSUBR m64 DC /5 Floating-point reverse subtract dst <- src – dst
[ ] FSUBR st,st(i) D8 E8+i Floating-point reverse subtract dst <- src – dst
[ ] FSUBR st(i),st DC E0+i Floating-point reverse subtract dst <- src – dst
[ ] FDIV m32 D8 /6 Floating-point divide[l] dst <- dst / src
[ ] FDIV m64 DC /6 Floating-point divide[l] dst <- dst / src
[ ] FDIV st,st(i) D8 F0+i Floating-point divide[l] dst <- dst / src
[ ] FDIV st(i),st DC F8+i Floating-point divide[l] dst <- dst / src
[ ] FDIVR m32 D8 /7 Floating-point reverse divide dst <- src / dst
[ ] FDIVR m64 DC /7 Floating-point reverse divide dst <- src / dst
[ ] FDIVR st,st(i) D8 F8+i Floating-point reverse divide dst <- src / dst
[ ] FDIVR st(i),st DC F0+i Floating-point reverse divide dst <- src / dst
[ ] FCOM m32 D8 /2 Floating-point compare CC <- result_of( st(0) – src ) Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU stack registers
[ ] FCOM m64 DC /2 Floating-point compare CC <- result_of( st(0) – src ) Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU stack registers
[ ] FCOM st(i)[i] D8 D0+i Floating-point compare CC <- result_of( st(0) – src ) Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU stack registers
[ ] FCOM st(i)[i] DC D0+i[g] Floating-point compare CC <- result_of( st(0) – src ) Same operation as subtract, except that it updates the x87 CC status register instead of any of the FPU stack registers

x87 Basic Arithmetic Instructions with Stack Pop

[ ] FADDP st(i),st[i] DE C0+i Floating-point add and pop
[ ] FMULP st(i),st[i] DE C8+i Floating-point multiply and pop
[ ] FSUBP st(i),st[i] DE E8+i Floating-point subtract and pop
[ ] FSUBRP st(i),st[i] DE E0+i Floating-point reverse-subtract and pop
[ ] FDIVP st(i),st[i] DE F8+i Floating-point divide and pop
[ ] FDIVRP st(i),st[i] DE F0+i Floating-point reverse-divide and pop
[ ] FCOMP m32 D8 /3 Floating-point compare and pop
[ ] FCOMP m64 DC /3 Floating-point compare and pop
[ ] FCOMP st(i)[i] D8 D8+i Floating-point compare and pop
[ ] FCOMP st(i)[i] DC D8+i[g] Floating-point compare and pop
[ ] FCOMP st(i)[i] DE D0+i[g] Floating-point compare and pop
[ ] FCOMPP DE D9 Floating-point compare to st(1), then pop twice

x87 Basic Arithmetic Instructions with Integer Source Argument

[ ] FIADD m16 DA /0 Floating-point add by integer
[ ] FIADD m32 DE /0 Floating-point add by integer
[ ] FIMUL m16 DA /1 Floating-point multiply by integer
[ ] FIMUL m32 DE /1 Floating-point multiply by integer
[ ] FISUB m16 DA /4 Floating-point subtract by integer
[ ] FISUB m32 DE /4 Floating-point subtract by integer
[ ] FISUBR m16 DA /5 Floating-point reverse-subtract by integer
[ ] FISUBR m32 DE /5 Floating-point reverse-subtract by integer
[ ] FIDIV m16 DA /6 Floating-point divide by integer
[ ] FIDIV m32 DE /6 Floating-point divide by integer
[ ] FIDIVR m16 DA /7 Floating-point reverse-divide by integer
[ ] FIDIVR m32 DE /7 Floating-point reverse-divide by integer
[ ] FICOM m16 DA /2 Floating-point compare to integer
[ ] FICOM m32 DE /2 Floating-point compare to integer
[ ] FICOMP m16 DA /3 Floating-point compare to integer, and stack pop
[ ] FICOMP m32 DE /3 Floating-point compare to integer, and stack pop

x87 Additional Arithmetic Instructions

[ ] FCHS D9 E0 Floating-point change sign
[ ] FABS D9 E1 Floating-point absolute value
[ ] FTST D9 E4 Floating-point compare top-of-stack value to 0
[ ] FXAM D9 E5 Classify top-of-stack st(0) register value.
[ ] FXTRACT D9 F4 Split the st(0) value into two values E and M representing the exponent and mantissa of st(0).
[ ] FPREM D9 F8 Floating-point partial[o] remainder (not IEEE 754 compliant)
[ ] FSQRT D9 FA Floating-point square root
[ ] FRNDINT D9 FC Floating-point round to integer
[ ] FSCALE D9 FD Floating-point power-of-2 scaling. Rounds the value of st(1) to integer with round-to-zero, then uses it as a scale factor for st(0):[q]

x87 Transcendental Instructions

[ ] F2XM1 D9 F0 Base-2 exponential minus 1, with extra precision for st(0) close to 0:
[ ] FYL2X[t] D9 F1 Base-2 Logarithm:
[ ] FPTAN D9 F2 Partial Tangent: Computes from st(0) a pair of values X and Y, such that
[ ] FPATAN D9 F3 Two-argument arctangent with quadrant adjustment:[u]
[ ] FYL2XP1[t] D9 F9 Base-2 Logarithm plus 1, with extra precision for st(0) close to 0:

Other x87 Instructions

[ ] FNOP D9 D0 No operation[v]
[ ] FDECSTP D9 F6 Decrement x87 FPU Register Stack Pointer
[ ] FINCSTP D9 F7 Increment x87 FPU Register Stack Pointer
[ ] FFREE st(i) DD C0+i Free x87 FPU Register
[ ] WAIT, FWAIT 9B Check and handle pending unmasked x87 FPU exceptions
[ ] FSTPNCE st(i) D9 D8+i[g] Floating-point store and pop, without stack underflow exception
[ ] FFREEP st(i) DF C0+i[g] Free x87 register, then stack pop

x87 Non-Waiting Control Instructions added in 80287

[ ] FNSETPM DB E4 FSETPM Notify FPU of entry into Protected Mode[a]
[ ] FNSTSW AX DF E0 FSTSW AX Store x87 Status Word to AX

x87 Instructions added in 80387

[ ] FUCOM st(i)[c] DD E0+i Floating-point unordered compare.
[ ] FUCOMP st(i)[c] DD E8+i Floating-point unordered compare and pop
[ ] FUCOMPP DA E9 Floating-point unordered compare to st(1), then pop twice
[ ] FPREM1 D9 F5 IEEE 754 compliant floating-point partial remainder.[d]
[ ] FSINCOS D9 FB Floating-point sine and cosine.
[ ] FSIN D9 FE Floating-point sine.[e]
[ ] FCOS D9 FF Floating-point cosine.[e]

x87 Instructions added in Pentium Pro

[ ] FCMOVB st(0),st(i) DA C0+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVE st(0),st(i) DA C8+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVBE st(0),st(i) DA D0+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVU st(0),st(i) DA D8+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVNB st(0),st(i) DB C0+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVNE st(0),st(i) DB C8+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVNBE st(0),st(i) DB D0+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCMOVNU st(0),st(i) DB D8+i Floating-point conditional move to st(0) based on EFLAGS
[ ] FCOMI st(0),st(i) DB F0+i Floating-point compare and set EFLAGS.
[ ] FCOMIP st(0),st(i) DF F0+i Floating-point compare and set EFLAGS, then pop
[ ] FUCOMI st(0),st(i) DB E8+i Floating-point unordered compare and set EFLAGS
[ ] FUCOMIP st(0),st(i) DF E8+i Floating-point unordered compare and set EFLAGS, then pop

x87 Non-Waiting Instructions added in Pentium II, AMD K7 and SSE

[ ] FXSAVE m512byte NP 0F AE /0 FXSAVE64 m512byte Save x87, MMX and SSE state to 512-byte data structure _fxsave64
[ ] FXRSTOR m512byte NP 0F AE /1 FXRSTOR64 m512byte Restore x87, MMX and SSE state from 512-byte data structure _fxstore64

x87 Instructions added as part of SSE3

[ ] FISTTP m16 DF /1 Floating-point store integer and pop, with round-to-zero
[ ] FISTTP m32 DB /1 Floating-point store integer and pop, with round-to-zero
[ ] FISTTP m64 DD /1 Floating-point store integer and pop, with round-to-zero

x87 Instructions present in specific 80387 models

[ ] FRSTPM DB F4[56] Intel 287XL FPU Reset Protected Mode. Instruction to signal to the FPU that the main CPU is exiting protected mode, similar to how the FSETPM instruction is used to signal to the FPU that the CPU is entering protected mode. Different sources provide different encodings for this instruction.
[ ] FRSTPM DB E5[9] Intel 287XL FPU Reset Protected Mode. Instruction to signal to the FPU that the main CPU is exiting protected mode, similar to how the FSETPM instruction is used to signal to the FPU that the CPU is entering protected mode. Different sources provide different encodings for this instruction.
[ ] FNSTDW AX DF E1 Intel 387SL[9][57] Store FPU Device Word to AX
[ ] FNSTSG AX DF E2 Intel 387SL[9][57] Store FPU Signature Register to AX[a]
[ ] FSBP0 DB E8 IIT 2c87, 3c87[9][59] Select Coprocessor Register Bank 0
[ ] FSBP1 DB EB IIT 2c87, 3c87[9][59] Select Coprocessor Register Bank 1
[ ] FSBP2 DB EA IIT 2c87, 3c87[9][59] Select Coprocessor Register Bank 2
[ ] FSBP3 DB E9[60] IIT 2c87, 3c87[9][59] Select Coprocessor Register Bank 3 (undocumented)
[ ] F4X4, FMUL4X4 DB F1 IIT 2c87, 3c87[9][59] Multiply 4-component vector with 4x4 matrix. For proper operation, the matrix must be preloaded into Coprocessor Register banks 1 and 2 (unique to IIT FPUs), and the vector must be loaded into Coprocessor Register Bank 0. Example code is available.[59][61]
[ ] FTSTP D9 E6 Cyrix 387+[61] Equivalent to FTST followed by a stack pop.
[ ] FRINT2 DB FC Cyrix EMC87, 83s87, 83d87, 387+[61][9] Round st(0) to integer, with round-to-nearest rounding.
[ ] FRICHOP DD FC Cyrix EMC87, 83s87, 83d87, 387+[61][9] Round st(0) to integer, with round-to-zero rounding.
[ ] FRINEAR DF FC Cyrix EMC87, 83s87, 83d87, 387+[61][9] Round st(0) to integer, with round-to-nearest ties-away-from-zero rounding.
[ ] FIDIVRP Floating Point Integer Divide, Reversed, Pop (Cyrix Cx486DX/ZFx86, also mentions FLY2X, FLY2XP1 that look like mnemonic typos for FYL2X, FYL2XP1?)

x87 Instructions present in NEC μPD72091

[ ] FTAN Trigonometric Instructions
[ ] FATAN Inverse Trigonometric Instructions
[ ] FTANH Hyperbolic Functions
[ ] FXP2 Exponential Instructions
[ ] FXPT Exponential Instructions
[ ] FXPE Exponential Instructions
[ ] FLGTX Exponential Instructions
[ ] FLGEX Exponential Instructions
[ ] FREM IEEE Remainder
[ ] FLDDTR Constant Load
[ ] FSAVEA Save/Restore All Environment
[ ] FNSAVEA Save/Restore All Environment
[ ] FRSTORA Save/Restore All Environment
[ ] FSTENVA Save/Restore All Environment
[ ] FNSTENVA Save/Restore All Environment
[ ] FLDENVA Save/Restore All Environment
[ ] FLDCWA Save/Restore All Environment

x87 Instructions present in NEC μPD72191/D9008D

[ ] FPOWER the power function x^y. This function is difficult to implement not only for its complex definition but also for sufficient accuracy. The equation X^y = e^(y*logeX) does not give good accuracy because the accuracy error of the log function is augmented by the exponential function. The FPP solves this problem by providing a 74-bit data width for the mantissa data bus.

mr-c commented 8 months ago

@Torinde Do you know of any header files for these functions?

Torinde commented 8 months ago

Do you know of any header files for these functions?

No. @kklobe, do you know a header file for x87 functions?

kklobe commented 8 months ago

Do you know of any header files for these functions?

No. @kklobe, do you know a header file for x87 functions?

I'm not aware of any. I think a header file for these functions would be a tall order, especially on non-x86 platforms to perform the 80-bit extended precision calculations.

Torinde commented 8 months ago

Isn't that taken care of by SoftFloat (and the projects using it - see links at the first bullet in OP)?

The latest release of SoftFloat implements five floating-point formats: 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, 80-bit double-extended-precision, and 128-bit quadruple-precision. All required rounding modes, exception flags, and special values are supported. Fused multiply-add is also implemented for all formats except 80-bit double-extended-precision. Target-specific code is provided for various Intel x86 and ARM processors.

kklobe commented 8 months ago

Isn't that taken care of by SoftFloat (and the projects using it - see links at the first bullet in OP)?

The latest release of SoftFloat implements five floating-point formats: 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, 80-bit double-extended-precision, and 128-bit quadruple-precision. All required rounding modes, exception flags, and special values are supported. Fused multiply-add is also implemented for all formats except 80-bit double-extended-precision. Target-specific code is provided for various Intel x86 and ARM processors.

That strikes me as quite outside the scope of this project. The x87 instructions aren't really SIMD, and would require adding something like SoftFloat as a dependency, so now you no longer have a header-only solution to translate from SIMD instruction set to SIMD instruction set.

If I'm misunderstanding your suggestion, let me know.

Torinde commented 8 months ago

I thought parts of SoftFloat can be useful for the creation of a header file.

Torinde commented 4 months ago

xbyak:

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Will that be useful?

mr-c commented 4 months ago

xbyak:

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Will that be useful?

I won't speak for others, but that sounds too big to be apart of this project. However it could be a companion project to SIMDe, managed here in this GitHub organization or elsewhere. Code sharing is welcome, of course

Torinde commented 4 months ago

that sounds too big

Sorry, I meant to pick only the x87/FPU part from it (not everything), e.g. as answer to:

Do you know of any header files for these functions?

simd-everywhere / simde

x87 functions #1161