Open oscarbg opened 1 year ago
issue is quite old, but looks like there might be something off with the test? the cbuffer results look horrendously slow
thanks for remind to test again.. I have done an update to latest 545 vs 530 at the time drivers: one cbuffer change: before cbuffer{float4} load uniform: 78.939ms 0.151x after: cbuffer{float4} load uniform: 69.179ms 0.172x
EDIT: latest cbuffer results with max OC in this card: cbuffer{float4} load uniform: 57.818ms 0.188x cbuffer{float4} load linear: 305.236ms 0.036x cbuffer{float4} load random: 115.551ms 0.094x
I see similar results on a 4070 Super. I also have another observation, though: If I remove the masking with a runtime constant like so:
//uint elemIdx = (htid + i) | loadConstants.elementsMask;
uint elemIdx = (htid + i);
I get the following results instead:
cbuffer{float4} load uniform: 2.533ms 3.950x
cbuffer{float4} load linear: 273.011ms 0.037x
cbuffer{float4} load random: 100.094ms 0.100x
whereas in other cases like for structured buffers the change doesn't seem to make much of a difference. It seems like for some reason Ada struggles with dynamically indexing into the constant buffer here, even if the index is uniform. But when the index ends up as a constant, it outperforms the other buffers again.