Constant limit issues with r500 hardware

ondracka commented 2 years ago

This is to continue discussion about how to make nine and r300 work with D3D games (copying also the relevant discussion from the thread https://github.com/iXit/wine-nine-standalone/issues/133).

axeldavy : I've some code that should fix r500's support of 256 constants (Last 3 commits of https://github.com/iXit/Mesa-3D/commits/master) in case you want to test.

ondracka: I did a quick testing and now we fail to compile pretty much any vertex shader using relative addressing. The issue with r500 hardware is that not only constants but also immediates (I'm using the TGSI nomenclature here, as I don't know much about DX) must fit into the 256 constants limit. And what sucks the most is that for vertex shaders we don't have any inlining options (we can inline 1, 0, -1 using the constant swizzles and thats it). To be honest I have no clue how this is supposed to work (how this works on Windows). I'd be happy to implement/enhance stuff on the driver side, but I really have no idea. As I see it, there are two options:

we can overwrite the last constant with the immediates. This is what Wine and if I understand it correctly also nine (before this series) does, we can do it a bit more efficiently in the driver because we can get rid at least of the 0 and 1. crazy idea could be to potentially construct some of the constants (at least the easy ones like 2, 0.5, etc..) in the shaders, but that would have obvious code size and speed disadvantages.
But maybe this would be better discussed in a separate thread/issue, feel free to CC me to anything related.

BTW regarding the NPOT support discussed here, r500 emulates ALL of NPOT support in shaders (we don't have even partial support, we support only rectangular textures), so if there is something not working, open a bug at fdo and I'll take a look if I can enhance the shader emulation.

axeldavy:

@ondracka Is there an option to inline the offset for relative addressing ?

The difference indeed with this series is that when relative addressing is used, all 256 constants will be declared as used, thus leaving no space for the immediates. Something that nine does is that it reads the immediate if it has them, and write the immediate value into the constant buffer for it to work with relative addressing. We could instead always read the constant buffer, not the immediate, for r500. The question is whether that will be enough for the few immediate that the d3d asm -> TGSI translation generates.

axeldavy:

I think if you can work with constructing the few immediates that are generated by the asm conversion (-1, 0, 1, 0.5, 2., If I checked correctly), and the integer offset of relative addressing, then everything could fit.

EDIT: Actually this is more complicated than that. Many of these constants are not in paths that can trigger when we use relative addressing. However there are a few immediates that are used to enforce d3d9 behaviour for a few things that I bet r500 doesn't need. For example the a0 clamping eats a few immediates, as well as the pointsize clamping. We'll need to continue this discussion elsewhere.

ondracka commented 2 years ago

Inlining the relative addressing offset works fine. For example converting :

IMM[0] FLT32 {   10.0000,   0.0000,    0.0000,    0.0000}
...
   ADD TEMP[1], TEMP[0].xxxx, IMM[0].xxxx
   ARL ADDR[0].x, TEMP[1].xxxx
   MOV TEMP[2], CONST[0][ADDR[0].x]
...

into

...
   ARL ADDR[0].x, TEMP[0].xxxx
   MOV TEMP[2], CONST[0][10 + ADDR[0].x]
...

This will work OK and I think it would be a good start. TGSI supports this syntax, so this would preferably happen already at the nine level. But if not I can add some heuristic to the driver. So I think this we should start with, than we should take a look what fixup immediates we really need for R500 and after that I can take care of anything that remains at the driver level.

ondracka commented 2 years ago

Hm, but actually the integer addressing offsets are already properly inlined as they should be (just one random shader from Anno 1404)

VS3.0
DEF c221 { 3.000000 0.050000 1.000000 5.000000 }
DEF c222 { 0.500000 1.000000 0.000000 0.000000 }
DCL v0 POSITION0
DCL v1 TEXCOORD0
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.x___ TEXCOORD1
MUL r0.x___ c221.xxxx v1.xxxx 
FRC r0.x___ r0.xxxx 
MAD r0.x___ v1.xxxx c221.xxxx -(r0).xxxx 
MOVA A0.x___ r0.xxxx 
MOV r0 c[A0.xxxx+0] 
ADD r0 -(r0) c[A0.xxxx+1] 
MAD r1 v0.yyyy r0 c[A0.xxxx+0] 
MOV o2.x___ c[A0.xxxx+2].xxxx 
NRM r2.xyz_ r0 
MUL r0.xyz_ r2.zxyw c220.yzxw 
MAD r0.xyz_ r2.yzxw c220.zxyw -(r0) 
MUL r2.xyz_ r2 c221.yyyy 
MUL r0.xyz_ r0 v0.xxxx 
NRM r3.xyz_ r0 
MAD r0.xyz_ r3 r1.wwww r1 
MAD r0.xyz_ r2 v0.yyyy r0 
ADD r0.___w c221.zzzz -(v0).yyyy 
MAD r0.xyz_ r2 -(r0).wwww r0 
ADD r0._y__ r0.yyyy c221.wwww 
MUL r1 r0.yyyy c217 
MAD r1 c216 r0.xxxx r1 
MAD r0 c218 r0.zzzz r1 
ADD o0 r0 c219 
MAD o1.xy__ v0 c222 c222.xzzw 
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].x, GENERIC[1]
DCL CONST[0][0..255]
DCL TEMP[0..1]
DCL TEMP[2..3], LOCAL
DCL TEMP[4..6]
DCL ADDR[0]
IMM[0] FLT32 {    3.0000,     0.0500,     1.0000,     5.0000}
IMM[1] FLT32 {    0.5000,     1.0000,     0.0000,  -256.0000}
IMM[2] FLT32 {    1.5000,     2.5000,     0.0000,     0.0000}
  0: MUL TEMP[0].x, IMM[0].xxxx, IN[1].xxxx
  1: FRC TEMP[0].x, TEMP[0].xxxx
  2: MAD TEMP[0].x, IN[1].xxxx, IMM[0].xxxx, -TEMP[0].xxxx
  3: MOV TEMP[1].x, TEMP[0].xxxx
  4: ARR ADDR[0], TEMP[1]
  5: ADD TEMP[2], TEMP[1].xxxx, IMM[1].xxxx
  6: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x]
  7: ADD TEMP[2], TEMP[2], IMM[1].wwww
  8: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
  9: MOV TEMP[0], TEMP[3]
 10: ARR ADDR[0], TEMP[1]
 11: ADD TEMP[2], TEMP[1].xxxx, IMM[2].xxxx
 12: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x+1]
 13: ADD TEMP[2], TEMP[2], IMM[1].wwww
 14: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
 15: ADD TEMP[0], -TEMP[0], TEMP[3]
 16: ARR ADDR[0], TEMP[1]
 17: ADD TEMP[2], TEMP[1].xxxx, IMM[1].xxxx
 18: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x]
 19: ADD TEMP[2], TEMP[2], IMM[1].wwww
 20: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
 21: MAD TEMP[4], IN[0].yyyy, TEMP[0], TEMP[3]
 22: ARR ADDR[0], TEMP[1]
 23: ADD TEMP[2], TEMP[1].xxxx, IMM[2].yyyy
 24: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x+2]
 25: ADD TEMP[2], TEMP[2], IMM[1].wwww
 26: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
 27: MOV OUT[2].x, TEMP[3].xxxx
 28: DP3 TEMP[2].x, TEMP[0], TEMP[0]
 29: RSQ TEMP[2].x, TEMP[2].xxxx
 30: MUL TEMP[5].xyz, TEMP[0], TEMP[2].xxxx
 31: MUL TEMP[0].xyz, TEMP[5].zxyw, CONST[0][220].yzxw
 32: MAD TEMP[0].xyz, TEMP[5].yzxw, CONST[0][220].zxyw, -TEMP[0]
 33: MUL TEMP[5].xyz, TEMP[5], IMM[0].yyyy
 34: MUL TEMP[0].xyz, TEMP[0], IN[0].xxxx
 35: DP3 TEMP[2].x, TEMP[0], TEMP[0]
 36: RSQ TEMP[2].x, TEMP[2].xxxx
 37: MUL TEMP[6].xyz, TEMP[0], TEMP[2].xxxx
 38: MAD TEMP[0].xyz, TEMP[6], TEMP[4].wwww, TEMP[4]
 39: MAD TEMP[0].xyz, TEMP[5], IN[0].yyyy, TEMP[0]
 40: ADD TEMP[0].w, IMM[0].zzzz, -IN[0].yyyy
 41: MAD TEMP[0].xyz, TEMP[5], -TEMP[0].wwww, TEMP[0]
 42: ADD TEMP[0].y, TEMP[0].yyyy, IMM[0].wwww
 43: MUL TEMP[4], TEMP[0].yyyy, CONST[0][217]
 44: MAD TEMP[4], CONST[0][216], TEMP[0].xxxx, TEMP[4]
 45: MAD TEMP[0], CONST[0][218], TEMP[0].zzzz, TEMP[4]
 46: ADD OUT[0], TEMP[0], CONST[0][219]
 47: MAD OUT[1].xy, IN[0], IMM[1].xyzz, IMM[1].xzzz
 48: END

ondracka commented 2 years ago

So the immediates are mostly what the shader declares, except the -256, 1.5 and 2.5 that we use for the A0 fixups I assume.

ondracka commented 2 years ago

BTW I have mostly no clue about D3D, how is is is supposed to work? Are the immediates supposed to be the last constants, i.e., if the shader defines immediates c221 and c222, can we assume that nothing after c222 will be accessed by the relative addressing?

axeldavy commented 2 years ago

Yes exactly for -256, 1.5 and 2.5. We use them to handle out of bounds read of a0 (we have bound checks). But I would guess that r500 would natively already handle them the d3d9 way, thus we could disable that code for it.

The shader has 256 constants, and the value of those are defined via API (you set them manually, you don't bind a buffer). In your shader c221 and c222 are defined in the shader. In that case those values overwrite whatever the user has set via the API for them. That also affects relative adressing.

So basically when we see the use of a0, we upload all 256 constants (we cannot guess where a0 will stop reading), and c221 and c222 will overwrite the constants set by the user there. For efficiency, nine will still inline the direct uses of c221 and c222, thus they will be present both as immediates (for direct use), and in the constant buffer (for relative adressing). But since r500 cannot inline immediates in the code, this seems counterproductive for it. In addition it means more constants are used.

ondracka commented 2 years ago

Regarding the a0 bound check, the hardware docs say: _The address registers are loaded using a MOV instruction from any of the IVM, CSM, TRM or ATRM. .... The value is clamped between the range of –256 and 255. When this value is added to the constant address of the current operation, the result is tested for in the range of 0 to MAX_SHADER_CONST where MAX_SHADER_CONST is determined by the driver as the maximum constant address provided by the shader declaration. If the resultant address is out of the range 0 to MAX_SHADERCONST, (0,0,0,0) is returned on the data path. Is this what d3d9 expects? If so, we can hopefully just add a pipe shader cap and stop emitting the workaround. This should be straightforward enough that I could write a patch :-)

Regarding the constants, it seems that indeed when relative addressing is present, the only way how we could solve it is to not tell the driver that something is an immediate and just upload it with the others constant.

One possible optimization I see would be to just upload 0 and 1 as an immediates. For example in the shader above: DEF c221 { 3.000000 0.050000 1.000000 5.000000 } ... MUL r2.xyz_ r2 c221.yyyy /// here we would read the normal constant ... ADD r0.___w c221.zzzz -(v0).yyyy /// here one would add the 1.0 immediate and use it ... If the immediates contain only 1 and 0 we can get rid of them in the optimization passes, so they are not included in the limit. But this can be done as a follow up...

axeldavy commented 2 years ago

Yes that's exactly what d3d9 expects.

As for the immediates, why not. First step is not even test for 0 or 1.

axeldavy commented 2 years ago

Alright, here is a branch with the use of constants instead of immediates when relative addressing is used: https://gitlab.freedesktop.org/axeldavy/mesa/-/commits/r500_constant_number

ondracka commented 2 years ago

I'm not sure if its working, I still see the constants from the shaders as an immediates. Here are dumped shaders (RADEON_DEBUG=use_tgsi,vp NINE_DEBUG=shaders) from my testcase Anno1404: Anno1404log.zip I don't see the ARR fixups anymore so that is something

.

axeldavy commented 2 years ago

the 'use constants instead of immediates' is only activated when relative addressing is used.

ondracka commented 2 years ago

It looks like it doesn't happen also for relative adressing, for example from the previously attached log:

nine:nine_translate_shader Recompiling shader for constant compaction
DEF c226 { 0.500000 0.015625 1.000000 0.000000 }
DCL v0 POSITION0
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.xy__ TEXCOORD1
DCL o3 TEXCOORD2
FRC r0.x___ v0.zzzz 
ADD r0.x___ -(r0).xxxx v0.zzzz 
MOVA A0.x___ r0.xxxx 
MAD r0.xy__ v0 c226.xxxx c226.xxxx 
MOV r0.__z_ c224.zzzz 
MAD r0.xy__ r0 r0.zzzz c[A0.xxxx+0].xzzw 
MOV r1 c220 
MAD r1 r1 r0.xxxx c221 
MAD r1 c222 r0.yyyy r1 
ADD o0 r1 c223 
MUL o1.xy__ r0 c225 
MUL o2.xy__ r0 c226.yyyy 
MOV o3.x_z_ r0.xyyw 
MOV o3._y_w c226.zzzz 
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].xy, GENERIC[1]
DCL OUT[3], GENERIC[2]
DCL CONST[0][0..255]
DCL TEMP[0..2]
DCL ADDR[0]
IMM[0] FLT32 {    0.5000,     0.0156,     1.0000,     0.0000}
  0: FRC TEMP[0].x, IN[0].zzzz
  1: ADD TEMP[0].x, -TEMP[0].xxxx, IN[0].zzzz
  2: MOV TEMP[1].x, TEMP[0].xxxx
  3: MAD TEMP[0].xy, IN[0], IMM[0].xxxx, IMM[0].xxxx
  4: MOV TEMP[0].z, CONST[0][224].zzzz
  5: ARR ADDR[0], TEMP[1]
  6: MAD TEMP[0].xy, TEMP[0], TEMP[0].zzzz, CONST[0][ADDR[0].x].xzzw
  7: MOV TEMP[2], CONST[0][220]
  8: MAD TEMP[2], TEMP[2], TEMP[0].xxxx, CONST[0][221]
  9: MAD TEMP[2], CONST[0][222], TEMP[0].yyyy, TEMP[2]
 10: ADD OUT[0], TEMP[2], CONST[0][223]
 11: MUL OUT[1].xy, TEMP[0], CONST[0][225]
 12: MUL OUT[2].xy, TEMP[0], IMM[0].yyyy
 13: MOV OUT[3].xz, TEMP[0].xyyw
 14: MOV OUT[3].yw, IMM[0].zzzz
 15: END

axeldavy commented 2 years ago

The message "Recompiling shader for constant compact" seems to indicate relative addressing was not detected. Somehow indirect_const_access is False. That's definitely a bug.

Could you investigate this ? Find which use of ureg_src_indirect is related to that ADDR usage, and should set indirect_const_access to TRUE ? The reason other uses of ureg_src_indirect do not set it is because they are supposed to be relative addressing of inputs/outputs, not constants... Somehow that's not always true.

ondracka commented 2 years ago

I think that DBG Recompiling shader for constant compact is misleading because looking at the code is is actually in both in the if (!tx->indirect_const_access && !info->swvp_on && tx->num_slots > 0) and else if (tx->indirect_const_access && device->driver_caps.reduce_immediates) branches, however how about https://gitlab.freedesktop.org/axeldavy/mesa/-/commit/0d337d2c17fbd1d7d925bc51bc05a5d5b1f0fb89#8b58653d347461ec06666d96968a9c6867a2693c_4166_4187 Is that line correct? I think that should be true instead of false, right?

ondracka commented 2 years ago

When setting tx->reduce_immediates = true; its better, the only shaders that fail are the ones using sin and cos. The problem is that the hardware needs the sin and cos inputs to be in the (-pi,+pi) range, so we do some magic to normalize the inputs, but for that we need constants (at least pi, but in practice pi, 2pi and 1/(2pi)). The game probably knows this and the shaders normalize the inputs already, however when we don't see what the immediates are we can't know if its normalized or not and have to do it anyway. I guess we would just have to skip it when we can't add any more constants and hope for the best...

ondracka commented 2 years ago

Just one example

DEF c212 { 0.033333 3.000000 0.333333 0.090909 }
DEF c213 { 0.000000 1.000000 0.000000 0.000000 }
DEF c214 { 0.159155 0.500000 6.283185 -3.141593 }
DCL v0 POSITION0
DCL v1 COLOR0
DCL v2 TEXCOORD0
DCL v3 TEXCOORD1
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.x___ TEXCOORD1
DCL o3.x___ COLOR0
MUL r0.x___ c212.yyyy v3.xxxx 
FRC r0.x___ r0.xxxx 
MAD r0.x___ v3.xxxx c212.yyyy -(r0).xxxx 
MOVA A0.x___ r0.xxxx 
MUL r0.xyz_ v0 c[A0.xxxx+0].wwww 
MUL r1.xyz_ r0.xzxw c[A0.xxxx+1].yxxw 
MAD r0.__z_ c[A0.xxxx+1].yyyy r0.zzzz -(r1).zzzz 
ADD r0.x___ r1.yyyy r1.xxxx 
ADD r0.xyz_ r0 c[A0.xxxx+0] 
MOV r1.__zw c212 
MAD r1.xy__ r0.xzzw r1.zwzw c210 
MAD r1.xy__ r1 c214.xxxx c214.yyyy 
FRC r1.xy__ r1 
MAD r1.xy__ r1 c214.zzzz c214.wwww 
SINCOS r2._y__ r1.xxxx 
SINCOS r3.x___ r1.yyyy 
MUL r0.___w r2.yyyy r3.xxxx 
MOV r1.x___ r2.yyyy 
MOV r1.__z_ r3.xxxx 
MAD r0.___w r0.wwww c214.yyyy c214.yyyy 
MUL r1._y__ v0.yyyy v0.yyyy 
MUL r1._y__ r1.yyyy v0.yyyy 
MUL r1._y__ r1.yyyy v1.yyyy 
MUL r1._y__ r1.yyyy c212.xxxx 
MUL r1._y__ r1.yyyy c[A0.xxxx+0].wwww 
MOV o3.x___ c[A0.xxxx+2].wwww 
MUL r2.xy__ r1.yyyy c211 
MUL r2.xy__ r0.wwww r2 
MUL r1.x_z_ r1 r1.yyyy 
SLT r0.___w -abs(r1).yyyy abs(r1).yyyy 
MOV r1._y_w c209.zzzz 
MAD r1.xy__ r1.xzzw r1.ywzw r2 
MUL r1.x_z_ r0.wwww r1.xyyw 
MOV r1._y__ c213.xxxx 
ADD r0.xyz_ r0 r1 
MUL r1.xyz_ r0.yyyy c205.xyww 
MAD r1.xyz_ c204.xyww r0.xxxx r1 
MAD r1.xyz_ c206.xyww r0.zzzz r1 
ADD r1.xyz_ r1 c207.xyww 
MOV r0.___w c213.yyyy 
DP4 r0.x___ c208 r0 
MUL o0.__z_ r1.zzzz r0.xxxx 
MOV o0.xy_w r1.xyzz 
MOV o2.x___ r0.xxxx 
MOV o1.xy__ v2 
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL IN[3]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].x, GENERIC[1]
DCL OUT[3].x, COLOR
DCL CONST[0][0..255]
DCL TEMP[0..3]
DCL TEMP[4], LOCAL
DCL TEMP[5]
DCL ADDR[0]
IMM[0] FLT32 {    0.0333,     3.0000,     0.3333,     0.0909}
IMM[1] FLT32 {    0.0000,     1.0000,     0.1592,     0.5000}
IMM[2] FLT32 {    0.1592,     0.5000,     6.2832,    -3.1416}
  0: MUL TEMP[0].x, CONST[0][212].yyyy, IN[3].xxxx
  1: FRC TEMP[0].x, TEMP[0].xxxx
  2: MAD TEMP[0].x, IN[3].xxxx, CONST[0][212].yyyy, -TEMP[0].xxxx
  3: MOV TEMP[1].x, TEMP[0].xxxx
  4: ARR ADDR[0], TEMP[1]
  5: MUL TEMP[0].xyz, IN[0], CONST[0][ADDR[0].x].wwww
  6: ARR ADDR[0], TEMP[1]
  7: MUL TEMP[2].xyz, TEMP[0].xzxw, CONST[0][ADDR[0].x+1].yxxw
  8: ARR ADDR[0], TEMP[1]
  9: MAD TEMP[0].z, CONST[0][ADDR[0].x+1].yyyy, TEMP[0].zzzz, -TEMP[2].zzzz
 10: ADD TEMP[0].x, TEMP[2].yyyy, TEMP[2].xxxx
 11: ARR ADDR[0], TEMP[1]
 12: ADD TEMP[0].xyz, TEMP[0], CONST[0][ADDR[0].x]
 13: MOV TEMP[2].zw, CONST[0][212]
 14: MAD TEMP[2].xy, TEMP[0].xzzw, TEMP[2].zwzw, CONST[0][210]
 15: MAD TEMP[2].xy, TEMP[2], CONST[0][214].xxxx, CONST[0][214].yyyy
 16: FRC TEMP[2].xy, TEMP[2]
 17: MAD TEMP[2].xy, TEMP[2], CONST[0][214].zzzz, CONST[0][214].wwww
 18: MOV TEMP[4].x, TEMP[2].xxxx
 19: SIN TEMP[3].y, TEMP[4].xxxx
 20: MOV TEMP[4].x, TEMP[2].yyyy
 21: COS TEMP[5].x, TEMP[4].xxxx
 22: MUL TEMP[0].w, TEMP[3].yyyy, TEMP[5].xxxx
 23: MOV TEMP[2].x, TEMP[3].yyyy
 24: MOV TEMP[2].z, TEMP[5].xxxx
 25: MAD TEMP[0].w, TEMP[0].wwww, CONST[0][214].yyyy, CONST[0][214].yyyy
 26: MUL TEMP[2].y, IN[0].yyyy, IN[0].yyyy
 27: MUL TEMP[2].y, TEMP[2].yyyy, IN[0].yyyy
 28: MUL TEMP[2].y, TEMP[2].yyyy, IN[1].yyyy
 29: MUL TEMP[2].y, TEMP[2].yyyy, CONST[0][212].xxxx
 30: ARR ADDR[0], TEMP[1]
 31: MUL TEMP[2].y, TEMP[2].yyyy, CONST[0][ADDR[0].x].wwww
 32: ARR ADDR[0], TEMP[1]
 33: MOV OUT[3].x, CONST[0][ADDR[0].x+2].wwww
 34: MUL TEMP[3].xy, TEMP[2].yyyy, CONST[0][211]
 35: MUL TEMP[3].xy, TEMP[0].wwww, TEMP[3]
 36: MUL TEMP[2].xz, TEMP[2], TEMP[2].yyyy
 37: SLT TEMP[0].w, -|TEMP[2].yyyy|, |TEMP[2].yyyy|
 38: MOV TEMP[2].yw, CONST[0][209].zzzz
 39: MAD TEMP[2].xy, TEMP[2].xzzw, TEMP[2].ywzw, TEMP[3]
 40: MUL TEMP[2].xz, TEMP[0].wwww, TEMP[2].xyyw
 41: MOV TEMP[2].y, CONST[0][213].xxxx
 42: ADD TEMP[0].xyz, TEMP[0], TEMP[2]
 43: MUL TEMP[2].xyz, TEMP[0].yyyy, CONST[0][205].xyww
 44: MAD TEMP[2].xyz, CONST[0][204].xyww, TEMP[0].xxxx, TEMP[2]
 45: MAD TEMP[2].xyz, CONST[0][206].xyww, TEMP[0].zzzz, TEMP[2]
 46: ADD TEMP[2].xyz, TEMP[2], CONST[0][207].xyww
 47: MOV TEMP[0].w, CONST[0][213].yyyy
 48: DP4 TEMP[0].x, CONST[0][208], TEMP[0]
 49: MUL OUT[0].z, TEMP[2].zzzz, TEMP[0].xxxx
 50: MOV OUT[0].xyw, TEMP[2].xyzz
 51: MOV OUT[2].x, TEMP[0].xxxx
 52: MOV OUT[1].xy, IN[2]
 53: END

axeldavy commented 2 years ago

You are right about the mistake.

https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sincos---ps -> It's already in the correct [-pi; pi]

lorn10 commented 2 years ago

Regarding the point:

TODO better name

How would be something like d3d9c_register_rules or d3d9c_register_limits? The idea for this "sheming" is derived from legacy_math_rules.

Furthermore this would theoretically allow easily to introduce at some point in the future a corresponding d3d9b_register_rules or d3d9a_register_rules cap. So for the case that Gallium Nine will eventually sometimes support again shader model 2.0a and 2.0b class hardware.:wink:

axeldavy commented 2 years ago

@ondracka What's your view with the patch proposal ? Should I keep the current status quo ? Should I merge the patch (thus breaking some games that do currently work until the situation is fixed driver side) and let you work on it ?

ondracka commented 2 years ago

I would keep the status quo for now, it looks like there are more issues at the r300 driver side (and not just related to the constants limit). Your patches are definitely step in the right direction and something we want in the long run, but the driver is not ready, so IMO it doesn't make sense to push it now. I really appreciate the help, but unfortunately now I have to fix the driver to be able to benefit from it.

Also BTW what do you use for debugging D3D apps under wine/nine? I wasn't able to find anything convenient so far. When it comes to normal debbugers, winedbg is a huge PITA, I'm for example not able to set a breakpoint in a mesa code... (using gdb as a proxy sometimes works, but also crashes a lot). So at some point I gave up and now I'm back at the good old put printfs everywhere approach, that isn't especially effective though... for OpenGL I'l quite fond of apitrace, but the windows apitrace under wine keeps crashing for me and when it works it has much less features compared to the OpenGL one.

ondracka commented 2 years ago

I would like to revisit this, now that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19277 is in and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19618 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18933 are ready to merge, we should be in much better shape on the r300 driver side. However, it looks like the https://gitlab.freedesktop.org/axeldavy/mesa/-/commits/r500_constant_number branch doesn't really help. The shaders look OK, there are no miscompilations however the rendering looks mostly the same as without the branch. Can I check somewhere in the nine logs what constants are actually being uploaded?

iXit / wine-nine-standalone

Constant limit issues with r500 hardware #140