Open ondracka opened 2 years ago
Inlining the relative addressing offset works fine. For example converting :
IMM[0] FLT32 { 10.0000, 0.0000, 0.0000, 0.0000}
...
ADD TEMP[1], TEMP[0].xxxx, IMM[0].xxxx
ARL ADDR[0].x, TEMP[1].xxxx
MOV TEMP[2], CONST[0][ADDR[0].x]
...
into
...
ARL ADDR[0].x, TEMP[0].xxxx
MOV TEMP[2], CONST[0][10 + ADDR[0].x]
...
This will work OK and I think it would be a good start. TGSI supports this syntax, so this would preferably happen already at the nine level. But if not I can add some heuristic to the driver. So I think this we should start with, than we should take a look what fixup immediates we really need for R500 and after that I can take care of anything that remains at the driver level.
Hm, but actually the integer addressing offsets are already properly inlined as they should be (just one random shader from Anno 1404)
VS3.0
DEF c221 { 3.000000 0.050000 1.000000 5.000000 }
DEF c222 { 0.500000 1.000000 0.000000 0.000000 }
DCL v0 POSITION0
DCL v1 TEXCOORD0
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.x___ TEXCOORD1
MUL r0.x___ c221.xxxx v1.xxxx
FRC r0.x___ r0.xxxx
MAD r0.x___ v1.xxxx c221.xxxx -(r0).xxxx
MOVA A0.x___ r0.xxxx
MOV r0 c[A0.xxxx+0]
ADD r0 -(r0) c[A0.xxxx+1]
MAD r1 v0.yyyy r0 c[A0.xxxx+0]
MOV o2.x___ c[A0.xxxx+2].xxxx
NRM r2.xyz_ r0
MUL r0.xyz_ r2.zxyw c220.yzxw
MAD r0.xyz_ r2.yzxw c220.zxyw -(r0)
MUL r2.xyz_ r2 c221.yyyy
MUL r0.xyz_ r0 v0.xxxx
NRM r3.xyz_ r0
MAD r0.xyz_ r3 r1.wwww r1
MAD r0.xyz_ r2 v0.yyyy r0
ADD r0.___w c221.zzzz -(v0).yyyy
MAD r0.xyz_ r2 -(r0).wwww r0
ADD r0._y__ r0.yyyy c221.wwww
MUL r1 r0.yyyy c217
MAD r1 c216 r0.xxxx r1
MAD r0 c218 r0.zzzz r1
ADD o0 r0 c219
MAD o1.xy__ v0 c222 c222.xzzw
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].x, GENERIC[1]
DCL CONST[0][0..255]
DCL TEMP[0..1]
DCL TEMP[2..3], LOCAL
DCL TEMP[4..6]
DCL ADDR[0]
IMM[0] FLT32 { 3.0000, 0.0500, 1.0000, 5.0000}
IMM[1] FLT32 { 0.5000, 1.0000, 0.0000, -256.0000}
IMM[2] FLT32 { 1.5000, 2.5000, 0.0000, 0.0000}
0: MUL TEMP[0].x, IMM[0].xxxx, IN[1].xxxx
1: FRC TEMP[0].x, TEMP[0].xxxx
2: MAD TEMP[0].x, IN[1].xxxx, IMM[0].xxxx, -TEMP[0].xxxx
3: MOV TEMP[1].x, TEMP[0].xxxx
4: ARR ADDR[0], TEMP[1]
5: ADD TEMP[2], TEMP[1].xxxx, IMM[1].xxxx
6: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x]
7: ADD TEMP[2], TEMP[2], IMM[1].wwww
8: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
9: MOV TEMP[0], TEMP[3]
10: ARR ADDR[0], TEMP[1]
11: ADD TEMP[2], TEMP[1].xxxx, IMM[2].xxxx
12: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x+1]
13: ADD TEMP[2], TEMP[2], IMM[1].wwww
14: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
15: ADD TEMP[0], -TEMP[0], TEMP[3]
16: ARR ADDR[0], TEMP[1]
17: ADD TEMP[2], TEMP[1].xxxx, IMM[1].xxxx
18: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x]
19: ADD TEMP[2], TEMP[2], IMM[1].wwww
20: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
21: MAD TEMP[4], IN[0].yyyy, TEMP[0], TEMP[3]
22: ARR ADDR[0], TEMP[1]
23: ADD TEMP[2], TEMP[1].xxxx, IMM[2].yyyy
24: CMP TEMP[3], TEMP[2], IMM[1].zzzz, CONST[0][ADDR[0].x+2]
25: ADD TEMP[2], TEMP[2], IMM[1].wwww
26: CMP TEMP[3], TEMP[2], TEMP[3], IMM[1].zzzz
27: MOV OUT[2].x, TEMP[3].xxxx
28: DP3 TEMP[2].x, TEMP[0], TEMP[0]
29: RSQ TEMP[2].x, TEMP[2].xxxx
30: MUL TEMP[5].xyz, TEMP[0], TEMP[2].xxxx
31: MUL TEMP[0].xyz, TEMP[5].zxyw, CONST[0][220].yzxw
32: MAD TEMP[0].xyz, TEMP[5].yzxw, CONST[0][220].zxyw, -TEMP[0]
33: MUL TEMP[5].xyz, TEMP[5], IMM[0].yyyy
34: MUL TEMP[0].xyz, TEMP[0], IN[0].xxxx
35: DP3 TEMP[2].x, TEMP[0], TEMP[0]
36: RSQ TEMP[2].x, TEMP[2].xxxx
37: MUL TEMP[6].xyz, TEMP[0], TEMP[2].xxxx
38: MAD TEMP[0].xyz, TEMP[6], TEMP[4].wwww, TEMP[4]
39: MAD TEMP[0].xyz, TEMP[5], IN[0].yyyy, TEMP[0]
40: ADD TEMP[0].w, IMM[0].zzzz, -IN[0].yyyy
41: MAD TEMP[0].xyz, TEMP[5], -TEMP[0].wwww, TEMP[0]
42: ADD TEMP[0].y, TEMP[0].yyyy, IMM[0].wwww
43: MUL TEMP[4], TEMP[0].yyyy, CONST[0][217]
44: MAD TEMP[4], CONST[0][216], TEMP[0].xxxx, TEMP[4]
45: MAD TEMP[0], CONST[0][218], TEMP[0].zzzz, TEMP[4]
46: ADD OUT[0], TEMP[0], CONST[0][219]
47: MAD OUT[1].xy, IN[0], IMM[1].xyzz, IMM[1].xzzz
48: END
So the immediates are mostly what the shader declares, except the -256, 1.5 and 2.5 that we use for the A0 fixups I assume.
BTW I have mostly no clue about D3D, how is is is supposed to work? Are the immediates supposed to be the last constants, i.e., if the shader defines immediates c221 and c222, can we assume that nothing after c222 will be accessed by the relative addressing?
Yes exactly for -256, 1.5 and 2.5. We use them to handle out of bounds read of a0 (we have bound checks). But I would guess that r500 would natively already handle them the d3d9 way, thus we could disable that code for it.
The shader has 256 constants, and the value of those are defined via API (you set them manually, you don't bind a buffer). In your shader c221 and c222 are defined in the shader. In that case those values overwrite whatever the user has set via the API for them. That also affects relative adressing.
So basically when we see the use of a0, we upload all 256 constants (we cannot guess where a0 will stop reading), and c221 and c222 will overwrite the constants set by the user there. For efficiency, nine will still inline the direct uses of c221 and c222, thus they will be present both as immediates (for direct use), and in the constant buffer (for relative adressing). But since r500 cannot inline immediates in the code, this seems counterproductive for it. In addition it means more constants are used.
Regarding the a0 bound check, the hardware docs say: _The address registers are loaded using a MOV instruction from any of the IVM, CSM, TRM or ATRM. .... The value is clamped between the range of –256 and 255. When this value is added to the constant address of the current operation, the result is tested for in the range of 0 to MAX_SHADER_CONST where MAX_SHADER_CONST is determined by the driver as the maximum constant address provided by the shader declaration. If the resultant address is out of the range 0 to MAX_SHADERCONST, (0,0,0,0) is returned on the data path. Is this what d3d9 expects? If so, we can hopefully just add a pipe shader cap and stop emitting the workaround. This should be straightforward enough that I could write a patch :-)
Regarding the constants, it seems that indeed when relative addressing is present, the only way how we could solve it is to not tell the driver that something is an immediate and just upload it with the others constant.
One possible optimization I see would be to just upload 0 and 1 as an immediates. For example in the shader above: DEF c221 { 3.000000 0.050000 1.000000 5.000000 } ... MUL r2.xyz_ r2 c221.yyyy /// here we would read the normal constant ... ADD r0.___w c221.zzzz -(v0).yyyy /// here one would add the 1.0 immediate and use it ... If the immediates contain only 1 and 0 we can get rid of them in the optimization passes, so they are not included in the limit. But this can be done as a follow up...
Yes that's exactly what d3d9 expects.
As for the immediates, why not. First step is not even test for 0 or 1.
Alright, here is a branch with the use of constants instead of immediates when relative addressing is used: https://gitlab.freedesktop.org/axeldavy/mesa/-/commits/r500_constant_number
I'm not sure if its working, I still see the constants from the shaders as an immediates. Here are dumped shaders (RADEON_DEBUG=use_tgsi,vp NINE_DEBUG=shaders
) from my testcase Anno1404: Anno1404log.zip I don't see the ARR fixups anymore so that is something
.
the 'use constants instead of immediates' is only activated when relative addressing is used.
It looks like it doesn't happen also for relative adressing, for example from the previously attached log:
nine:nine_translate_shader Recompiling shader for constant compaction
DEF c226 { 0.500000 0.015625 1.000000 0.000000 }
DCL v0 POSITION0
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.xy__ TEXCOORD1
DCL o3 TEXCOORD2
FRC r0.x___ v0.zzzz
ADD r0.x___ -(r0).xxxx v0.zzzz
MOVA A0.x___ r0.xxxx
MAD r0.xy__ v0 c226.xxxx c226.xxxx
MOV r0.__z_ c224.zzzz
MAD r0.xy__ r0 r0.zzzz c[A0.xxxx+0].xzzw
MOV r1 c220
MAD r1 r1 r0.xxxx c221
MAD r1 c222 r0.yyyy r1
ADD o0 r1 c223
MUL o1.xy__ r0 c225
MUL o2.xy__ r0 c226.yyyy
MOV o3.x_z_ r0.xyyw
MOV o3._y_w c226.zzzz
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].xy, GENERIC[1]
DCL OUT[3], GENERIC[2]
DCL CONST[0][0..255]
DCL TEMP[0..2]
DCL ADDR[0]
IMM[0] FLT32 { 0.5000, 0.0156, 1.0000, 0.0000}
0: FRC TEMP[0].x, IN[0].zzzz
1: ADD TEMP[0].x, -TEMP[0].xxxx, IN[0].zzzz
2: MOV TEMP[1].x, TEMP[0].xxxx
3: MAD TEMP[0].xy, IN[0], IMM[0].xxxx, IMM[0].xxxx
4: MOV TEMP[0].z, CONST[0][224].zzzz
5: ARR ADDR[0], TEMP[1]
6: MAD TEMP[0].xy, TEMP[0], TEMP[0].zzzz, CONST[0][ADDR[0].x].xzzw
7: MOV TEMP[2], CONST[0][220]
8: MAD TEMP[2], TEMP[2], TEMP[0].xxxx, CONST[0][221]
9: MAD TEMP[2], CONST[0][222], TEMP[0].yyyy, TEMP[2]
10: ADD OUT[0], TEMP[2], CONST[0][223]
11: MUL OUT[1].xy, TEMP[0], CONST[0][225]
12: MUL OUT[2].xy, TEMP[0], IMM[0].yyyy
13: MOV OUT[3].xz, TEMP[0].xyyw
14: MOV OUT[3].yw, IMM[0].zzzz
15: END
The message "Recompiling shader for constant compact" seems to indicate relative addressing was not detected. Somehow indirect_const_access is False. That's definitely a bug.
Could you investigate this ? Find which use of ureg_src_indirect is related to that ADDR usage, and should set indirect_const_access to TRUE ? The reason other uses of ureg_src_indirect do not set it is because they are supposed to be relative addressing of inputs/outputs, not constants... Somehow that's not always true.
I think that DBG Recompiling shader for constant compact is misleading because looking at the code is is actually in both in the if (!tx->indirect_const_access && !info->swvp_on && tx->num_slots > 0)
and else if (tx->indirect_const_access && device->driver_caps.reduce_immediates)
branches, however how about
https://gitlab.freedesktop.org/axeldavy/mesa/-/commit/0d337d2c17fbd1d7d925bc51bc05a5d5b1f0fb89#8b58653d347461ec06666d96968a9c6867a2693c_4166_4187
Is that line correct? I think that should be true instead of false, right?
When setting tx->reduce_immediates = true
; its better, the only shaders that fail are the ones using sin and cos. The problem is that the hardware needs the sin and cos inputs to be in the (-pi,+pi) range, so we do some magic to normalize the inputs, but for that we need constants (at least pi, but in practice pi, 2pi and 1/(2pi)). The game probably knows this and the shaders normalize the inputs already, however when we don't see what the immediates are we can't know if its normalized or not and have to do it anyway. I guess we would just have to skip it when we can't add any more constants and hope for the best...
Just one example
DEF c212 { 0.033333 3.000000 0.333333 0.090909 }
DEF c213 { 0.000000 1.000000 0.000000 0.000000 }
DEF c214 { 0.159155 0.500000 6.283185 -3.141593 }
DCL v0 POSITION0
DCL v1 COLOR0
DCL v2 TEXCOORD0
DCL v3 TEXCOORD1
DCL o0 POSITION0
DCL o1.xy__ TEXCOORD0
DCL o2.x___ TEXCOORD1
DCL o3.x___ COLOR0
MUL r0.x___ c212.yyyy v3.xxxx
FRC r0.x___ r0.xxxx
MAD r0.x___ v3.xxxx c212.yyyy -(r0).xxxx
MOVA A0.x___ r0.xxxx
MUL r0.xyz_ v0 c[A0.xxxx+0].wwww
MUL r1.xyz_ r0.xzxw c[A0.xxxx+1].yxxw
MAD r0.__z_ c[A0.xxxx+1].yyyy r0.zzzz -(r1).zzzz
ADD r0.x___ r1.yyyy r1.xxxx
ADD r0.xyz_ r0 c[A0.xxxx+0]
MOV r1.__zw c212
MAD r1.xy__ r0.xzzw r1.zwzw c210
MAD r1.xy__ r1 c214.xxxx c214.yyyy
FRC r1.xy__ r1
MAD r1.xy__ r1 c214.zzzz c214.wwww
SINCOS r2._y__ r1.xxxx
SINCOS r3.x___ r1.yyyy
MUL r0.___w r2.yyyy r3.xxxx
MOV r1.x___ r2.yyyy
MOV r1.__z_ r3.xxxx
MAD r0.___w r0.wwww c214.yyyy c214.yyyy
MUL r1._y__ v0.yyyy v0.yyyy
MUL r1._y__ r1.yyyy v0.yyyy
MUL r1._y__ r1.yyyy v1.yyyy
MUL r1._y__ r1.yyyy c212.xxxx
MUL r1._y__ r1.yyyy c[A0.xxxx+0].wwww
MOV o3.x___ c[A0.xxxx+2].wwww
MUL r2.xy__ r1.yyyy c211
MUL r2.xy__ r0.wwww r2
MUL r1.x_z_ r1 r1.yyyy
SLT r0.___w -abs(r1).yyyy abs(r1).yyyy
MOV r1._y_w c209.zzzz
MAD r1.xy__ r1.xzzw r1.ywzw r2
MUL r1.x_z_ r0.wwww r1.xyyw
MOV r1._y__ c213.xxxx
ADD r0.xyz_ r0 r1
MUL r1.xyz_ r0.yyyy c205.xyww
MAD r1.xyz_ c204.xyww r0.xxxx r1
MAD r1.xyz_ c206.xyww r0.zzzz r1
ADD r1.xyz_ r1 c207.xyww
MOV r0.___w c213.yyyy
DP4 r0.x___ c208 r0
MUL o0.__z_ r1.zzzz r0.xxxx
MOV o0.xy_w r1.xyzz
MOV o2.x___ r0.xxxx
MOV o1.xy__ v2
shader type: VS, preferred IR: TGSI, selected IR: TGSI
r300: Initial vertex program
VERT
PROPERTY NEXT_SHADER FRAG
PROPERTY LEGACY_MATH_RULES 1
DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL IN[3]
DCL OUT[0], POSITION
DCL OUT[1].xy, GENERIC[0]
DCL OUT[2].x, GENERIC[1]
DCL OUT[3].x, COLOR
DCL CONST[0][0..255]
DCL TEMP[0..3]
DCL TEMP[4], LOCAL
DCL TEMP[5]
DCL ADDR[0]
IMM[0] FLT32 { 0.0333, 3.0000, 0.3333, 0.0909}
IMM[1] FLT32 { 0.0000, 1.0000, 0.1592, 0.5000}
IMM[2] FLT32 { 0.1592, 0.5000, 6.2832, -3.1416}
0: MUL TEMP[0].x, CONST[0][212].yyyy, IN[3].xxxx
1: FRC TEMP[0].x, TEMP[0].xxxx
2: MAD TEMP[0].x, IN[3].xxxx, CONST[0][212].yyyy, -TEMP[0].xxxx
3: MOV TEMP[1].x, TEMP[0].xxxx
4: ARR ADDR[0], TEMP[1]
5: MUL TEMP[0].xyz, IN[0], CONST[0][ADDR[0].x].wwww
6: ARR ADDR[0], TEMP[1]
7: MUL TEMP[2].xyz, TEMP[0].xzxw, CONST[0][ADDR[0].x+1].yxxw
8: ARR ADDR[0], TEMP[1]
9: MAD TEMP[0].z, CONST[0][ADDR[0].x+1].yyyy, TEMP[0].zzzz, -TEMP[2].zzzz
10: ADD TEMP[0].x, TEMP[2].yyyy, TEMP[2].xxxx
11: ARR ADDR[0], TEMP[1]
12: ADD TEMP[0].xyz, TEMP[0], CONST[0][ADDR[0].x]
13: MOV TEMP[2].zw, CONST[0][212]
14: MAD TEMP[2].xy, TEMP[0].xzzw, TEMP[2].zwzw, CONST[0][210]
15: MAD TEMP[2].xy, TEMP[2], CONST[0][214].xxxx, CONST[0][214].yyyy
16: FRC TEMP[2].xy, TEMP[2]
17: MAD TEMP[2].xy, TEMP[2], CONST[0][214].zzzz, CONST[0][214].wwww
18: MOV TEMP[4].x, TEMP[2].xxxx
19: SIN TEMP[3].y, TEMP[4].xxxx
20: MOV TEMP[4].x, TEMP[2].yyyy
21: COS TEMP[5].x, TEMP[4].xxxx
22: MUL TEMP[0].w, TEMP[3].yyyy, TEMP[5].xxxx
23: MOV TEMP[2].x, TEMP[3].yyyy
24: MOV TEMP[2].z, TEMP[5].xxxx
25: MAD TEMP[0].w, TEMP[0].wwww, CONST[0][214].yyyy, CONST[0][214].yyyy
26: MUL TEMP[2].y, IN[0].yyyy, IN[0].yyyy
27: MUL TEMP[2].y, TEMP[2].yyyy, IN[0].yyyy
28: MUL TEMP[2].y, TEMP[2].yyyy, IN[1].yyyy
29: MUL TEMP[2].y, TEMP[2].yyyy, CONST[0][212].xxxx
30: ARR ADDR[0], TEMP[1]
31: MUL TEMP[2].y, TEMP[2].yyyy, CONST[0][ADDR[0].x].wwww
32: ARR ADDR[0], TEMP[1]
33: MOV OUT[3].x, CONST[0][ADDR[0].x+2].wwww
34: MUL TEMP[3].xy, TEMP[2].yyyy, CONST[0][211]
35: MUL TEMP[3].xy, TEMP[0].wwww, TEMP[3]
36: MUL TEMP[2].xz, TEMP[2], TEMP[2].yyyy
37: SLT TEMP[0].w, -|TEMP[2].yyyy|, |TEMP[2].yyyy|
38: MOV TEMP[2].yw, CONST[0][209].zzzz
39: MAD TEMP[2].xy, TEMP[2].xzzw, TEMP[2].ywzw, TEMP[3]
40: MUL TEMP[2].xz, TEMP[0].wwww, TEMP[2].xyyw
41: MOV TEMP[2].y, CONST[0][213].xxxx
42: ADD TEMP[0].xyz, TEMP[0], TEMP[2]
43: MUL TEMP[2].xyz, TEMP[0].yyyy, CONST[0][205].xyww
44: MAD TEMP[2].xyz, CONST[0][204].xyww, TEMP[0].xxxx, TEMP[2]
45: MAD TEMP[2].xyz, CONST[0][206].xyww, TEMP[0].zzzz, TEMP[2]
46: ADD TEMP[2].xyz, TEMP[2], CONST[0][207].xyww
47: MOV TEMP[0].w, CONST[0][213].yyyy
48: DP4 TEMP[0].x, CONST[0][208], TEMP[0]
49: MUL OUT[0].z, TEMP[2].zzzz, TEMP[0].xxxx
50: MOV OUT[0].xyw, TEMP[2].xyzz
51: MOV OUT[2].x, TEMP[0].xxxx
52: MOV OUT[1].xy, IN[2]
53: END
You are right about the mistake.
https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/sincos---ps -> It's already in the correct [-pi; pi]
Regarding the point:
TODO better name
How would be something like d3d9c_register_rules
or d3d9c_register_limits
?
The idea for this "sheming" is derived from legacy_math_rules
.
Furthermore this would theoretically allow easily to introduce at some point in the future a corresponding d3d9b_register_rules
or d3d9a_register_rules
cap. So for the case that Gallium Nine will eventually sometimes support again shader model 2.0a and 2.0b class hardware.:wink:
@ondracka What's your view with the patch proposal ? Should I keep the current status quo ? Should I merge the patch (thus breaking some games that do currently work until the situation is fixed driver side) and let you work on it ?
I would keep the status quo for now, it looks like there are more issues at the r300 driver side (and not just related to the constants limit). Your patches are definitely step in the right direction and something we want in the long run, but the driver is not ready, so IMO it doesn't make sense to push it now. I really appreciate the help, but unfortunately now I have to fix the driver to be able to benefit from it.
Also BTW what do you use for debugging D3D apps under wine/nine? I wasn't able to find anything convenient so far. When it comes to normal debbugers, winedbg is a huge PITA, I'm for example not able to set a breakpoint in a mesa code... (using gdb as a proxy sometimes works, but also crashes a lot). So at some point I gave up and now I'm back at the good old put printfs everywhere approach, that isn't especially effective though... for OpenGL I'l quite fond of apitrace, but the windows apitrace under wine keeps crashing for me and when it works it has much less features compared to the OpenGL one.
I would like to revisit this, now that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19277 is in and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19618 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18933 are ready to merge, we should be in much better shape on the r300 driver side. However, it looks like the https://gitlab.freedesktop.org/axeldavy/mesa/-/commits/r500_constant_number branch doesn't really help. The shaders look OK, there are no miscompilations however the rendering looks mostly the same as without the branch. Can I check somewhere in the nine logs what constants are actually being uploaded?
This is to continue discussion about how to make nine and r300 work with D3D games (copying also the relevant discussion from the thread https://github.com/iXit/wine-nine-standalone/issues/133).
axeldavy : I've some code that should fix r500's support of 256 constants (Last 3 commits of https://github.com/iXit/Mesa-3D/commits/master) in case you want to test.
ondracka: I did a quick testing and now we fail to compile pretty much any vertex shader using relative addressing. The issue with r500 hardware is that not only constants but also immediates (I'm using the TGSI nomenclature here, as I don't know much about DX) must fit into the 256 constants limit. And what sucks the most is that for vertex shaders we don't have any inlining options (we can inline 1, 0, -1 using the constant swizzles and thats it). To be honest I have no clue how this is supposed to work (how this works on Windows). I'd be happy to implement/enhance stuff on the driver side, but I really have no idea. As I see it, there are two options:
BTW regarding the NPOT support discussed here, r500 emulates ALL of NPOT support in shaders (we don't have even partial support, we support only rectangular textures), so if there is something not working, open a bug at fdo and I'll take a look if I can enhance the shader emulation.
axeldavy:
@ondracka Is there an option to inline the offset for relative addressing ?
The difference indeed with this series is that when relative addressing is used, all 256 constants will be declared as used, thus leaving no space for the immediates. Something that nine does is that it reads the immediate if it has them, and write the immediate value into the constant buffer for it to work with relative addressing. We could instead always read the constant buffer, not the immediate, for r500. The question is whether that will be enough for the few immediate that the d3d asm -> TGSI translation generates.
axeldavy:
I think if you can work with constructing the few immediates that are generated by the asm conversion (-1, 0, 1, 0.5, 2., If I checked correctly), and the integer offset of relative addressing, then everything could fit.
EDIT: Actually this is more complicated than that. Many of these constants are not in paths that can trigger when we use relative addressing. However there are a few immediates that are used to enforce d3d9 behaviour for a few things that I bet r500 doesn't need. For example the a0 clamping eats a few immediates, as well as the pointsize clamping. We'll need to continue this discussion elsewhere.