sebbbi / perftest

GPU texture/buffer performance tester
MIT License
537 stars 26 forks source link

Ada results (4070)! #20

Open oscarbg opened 1 year ago

oscarbg commented 1 year ago
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]

Adapters found:
0: NVIDIA GeForce RTX 4070
1: Microsoft Basic Render Driver
Using adapter 0

Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Performance compared to Buffer<RGBA8>.Load random

Buffer<R8>.Load uniform: 8.139ms 1.463x
Buffer<R8>.Load linear: 8.502ms 1.401x
Buffer<R8>.Load random: 10.092ms 1.180x
Buffer<RG8>.Load uniform: 10.089ms 1.180x
Buffer<RG8>.Load linear: 9.256ms 1.287x
Buffer<RG8>.Load random: 8.333ms 1.429x
Buffer<RGBA8>.Load uniform: 8.154ms 1.460x
Buffer<RGBA8>.Load linear: 8.229ms 1.447x
Buffer<RGBA8>.Load random: 11.908ms 1.000x
Buffer<R16f>.Load uniform: 7.930ms 1.502x
Buffer<R16f>.Load linear: 7.982ms 1.492x
Buffer<R16f>.Load random: 8.136ms 1.464x
Buffer<RG16f>.Load uniform: 7.984ms 1.491x
Buffer<RG16f>.Load linear: 8.150ms 1.461x
Buffer<RG16f>.Load random: 11.856ms 1.004x
Buffer<RGBA16f>.Load uniform: 8.125ms 1.466x
Buffer<RGBA16f>.Load linear: 8.248ms 1.444x
Buffer<RGBA16f>.Load random: 8.623ms 1.381x
Buffer<R32f>.Load uniform: 7.950ms 1.498x
Buffer<R32f>.Load linear: 7.956ms 1.497x
Buffer<R32f>.Load random: 11.854ms 1.005x
Buffer<RG32f>.Load uniform: 7.944ms 1.499x
Buffer<RG32f>.Load linear: 8.006ms 1.487x
Buffer<RG32f>.Load random: 7.972ms 1.494x
Buffer<RGBA32f>.Load uniform: 15.772ms 0.755x
Buffer<RGBA32f>.Load linear: 15.816ms 0.753x
Buffer<RGBA32f>.Load random: 15.837ms 0.752x
ByteAddressBuffer.Load uniform: 7.205ms 1.653x
ByteAddressBuffer.Load linear: 6.301ms 1.890x
ByteAddressBuffer.Load random: 6.112ms 1.948x
ByteAddressBuffer.Load2 uniform: 10.265ms 1.160x
ByteAddressBuffer.Load2 linear: 9.044ms 1.317x
ByteAddressBuffer.Load2 random: 9.039ms 1.317x
ByteAddressBuffer.Load3 uniform: 12.291ms 0.969x
ByteAddressBuffer.Load3 linear: 12.033ms 0.990x
ByteAddressBuffer.Load3 random: 11.978ms 0.994x
ByteAddressBuffer.Load4 uniform: 15.934ms 0.747x
ByteAddressBuffer.Load4 linear: 19.940ms 0.597x
ByteAddressBuffer.Load4 random: 15.964ms 0.746x
ByteAddressBuffer.Load2 unaligned uniform: 10.423ms 1.142x
ByteAddressBuffer.Load2 unaligned linear: 9.037ms 1.318x
ByteAddressBuffer.Load2 unaligned random: 9.016ms 1.321x
ByteAddressBuffer.Load4 unaligned uniform: 15.938ms 0.747x
ByteAddressBuffer.Load4 unaligned linear: 19.903ms 0.598x
ByteAddressBuffer.Load4 unaligned random: 15.955ms 0.746x
StructuredBuffer<float>.Load uniform: 7.030ms 1.694x
StructuredBuffer<float>.Load linear: 5.768ms 2.064x
StructuredBuffer<float>.Load random: 5.749ms 2.071x
StructuredBuffer<float2>.Load uniform: 8.017ms 1.485x
StructuredBuffer<float2>.Load linear: 8.032ms 1.483x
StructuredBuffer<float2>.Load random: 5.807ms 2.051x
StructuredBuffer<float4>.Load uniform: 8.560ms 1.391x
StructuredBuffer<float4>.Load linear: 8.521ms 1.398x
StructuredBuffer<float4>.Load random: 8.696ms 1.369x
cbuffer{float4} load uniform: 78.939ms 0.151x
cbuffer{float4} load linear: 330.084ms 0.036x
cbuffer{float4} load random: 125.805ms 0.095x
Texture2D<R8>.Load uniform: 7.969ms 1.494x
Texture2D<R8>.Load linear: 7.993ms 1.490x
Texture2D<R8>.Load random: 7.967ms 1.495x
Texture2D<RG8>.Load uniform: 8.197ms 1.453x
Texture2D<RG8>.Load linear: 8.385ms 1.420x
Texture2D<RG8>.Load random: 8.205ms 1.451x
Texture2D<RGBA8>.Load uniform: 8.318ms 1.432x
Texture2D<RGBA8>.Load linear: 11.926ms 0.999x
Texture2D<RGBA8>.Load random: 16.152ms 0.737x
Texture2D<R16F>.Load uniform: 7.970ms 1.494x
Texture2D<R16F>.Load linear: 7.970ms 1.494x
Texture2D<R16F>.Load random: 7.979ms 1.492x
Texture2D<RG16F>.Load uniform: 7.979ms 1.492x
Texture2D<RG16F>.Load linear: 12.097ms 0.984x
Texture2D<RG16F>.Load random: 16.136ms 0.738x
Texture2D<RGBA16F>.Load uniform: 8.157ms 1.460x
Texture2D<RGBA16F>.Load linear: 21.618ms 0.551x
Texture2D<RGBA16F>.Load random: 31.902ms 0.373x
Texture2D<R32F>.Load uniform: 7.944ms 1.499x
Texture2D<R32F>.Load linear: 12.044ms 0.989x
Texture2D<R32F>.Load random: 16.292ms 0.731x
Texture2D<RG32F>.Load uniform: 7.999ms 1.489x
Texture2D<RG32F>.Load linear: 21.805ms 0.546x
Texture2D<RG32F>.Load random: 31.726ms 0.375x
Texture2D<RGBA32F>.Load uniform: 15.820ms 0.753x
Texture2D<RGBA32F>.Load linear: 32.516ms 0.366x
Texture2D<RGBA32F>.Load random: 31.546ms 0.377x
Texture2D<R8>.Sample(nearest) uniform: 16.020ms 0.743x
Texture2D<R8>.Sample(nearest) linear: 15.839ms 0.752x
Texture2D<R8>.Sample(nearest) random: 16.225ms 0.734x
Texture2D<RG8>.Sample(nearest) uniform: 16.323ms 0.730x
Texture2D<RG8>.Sample(nearest) linear: 15.803ms 0.754x
Texture2D<RG8>.Sample(nearest) random: 15.788ms 0.754x
Texture2D<RGBA8>.Sample(nearest) uniform: 15.974ms 0.745x
Texture2D<RGBA8>.Sample(nearest) linear: 16.169ms 0.736x
Texture2D<RGBA8>.Sample(nearest) random: 16.185ms 0.736x
Texture2D<R16F>.Sample(nearest) uniform: 16.365ms 0.728x
Texture2D<R16F>.Sample(nearest) linear: 16.029ms 0.743x
Texture2D<R16F>.Sample(nearest) random: 15.818ms 0.753x
Texture2D<RG16F>.Sample(nearest) uniform: 15.780ms 0.755x
Texture2D<RG16F>.Sample(nearest) linear: 16.151ms 0.737x
Texture2D<RG16F>.Sample(nearest) random: 15.795ms 0.754x
Texture2D<RGBA16F>.Sample(nearest) uniform: 16.326ms 0.729x
Texture2D<RGBA16F>.Sample(nearest) linear: 16.014ms 0.744x
Texture2D<RGBA16F>.Sample(nearest) random: 31.503ms 0.378x
Texture2D<R32F>.Sample(nearest) uniform: 16.004ms 0.744x
Texture2D<R32F>.Sample(nearest) linear: 15.830ms 0.752x
Texture2D<R32F>.Sample(nearest) random: 16.198ms 0.735x
Texture2D<RG32F>.Sample(nearest) uniform: 15.928ms 0.748x
Texture2D<RG32F>.Sample(nearest) linear: 15.985ms 0.745x
Texture2D<RG32F>.Sample(nearest) random: 31.506ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 31.343ms 0.380x
Texture2D<RGBA32F>.Sample(nearest) linear: 31.767ms 0.375x
Texture2D<RGBA32F>.Sample(nearest) random: 31.557ms 0.377x
Texture2D<R8>.Sample(bilinear) uniform: 15.994ms 0.745x
Texture2D<R8>.Sample(bilinear) linear: 16.214ms 0.734x
Texture2D<R8>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG8>.Sample(bilinear) uniform: 15.786ms 0.754x
Texture2D<RG8>.Sample(bilinear) linear: 15.774ms 0.755x
Texture2D<RG8>.Sample(bilinear) random: 15.800ms 0.754x
Texture2D<RGBA8>.Sample(bilinear) uniform: 15.939ms 0.747x
Texture2D<RGBA8>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<RGBA8>.Sample(bilinear) random: 15.778ms 0.755x
Texture2D<R16F>.Sample(bilinear) uniform: 15.992ms 0.745x
Texture2D<R16F>.Sample(bilinear) linear: 15.820ms 0.753x
Texture2D<R16F>.Sample(bilinear) random: 15.821ms 0.753x
Texture2D<RG16F>.Sample(bilinear) uniform: 15.756ms 0.756x
Texture2D<RG16F>.Sample(bilinear) linear: 15.796ms 0.754x
Texture2D<RG16F>.Sample(bilinear) random: 15.760ms 0.756x
Texture2D<RGBA16F>.Sample(bilinear) uniform: 15.779ms 0.755x
Texture2D<RGBA16F>.Sample(bilinear) linear: 15.790ms 0.754x
Texture2D<RGBA16F>.Sample(bilinear) random: 31.697ms 0.376x
Texture2D<R32F>.Sample(bilinear) uniform: 15.805ms 0.753x
Texture2D<R32F>.Sample(bilinear) linear: 15.847ms 0.751x
Texture2D<R32F>.Sample(bilinear) random: 15.996ms 0.744x
Texture2D<RG32F>.Sample(bilinear) uniform: 15.761ms 0.756x
Texture2D<RG32F>.Sample(bilinear) linear: 15.770ms 0.755x
Texture2D<RG32F>.Sample(bilinear) random: 31.517ms 0.378x
Texture2D<RGBA32F>.Sample(bilinear) uniform: 62.698ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) linear: 62.823ms 0.190x
Texture2D<RGBA32F>.Sample(bilinear) random: 93.925ms 0.127x
TravisGesslein commented 10 months ago

issue is quite old, but looks like there might be something off with the test? the cbuffer results look horrendously slow

oscarbg commented 9 months ago

thanks for remind to test again.. I have done an update to latest 545 vs 530 at the time drivers: one cbuffer change: before cbuffer{float4} load uniform: 78.939ms 0.151x after: cbuffer{float4} load uniform: 69.179ms 0.172x

EDIT: latest cbuffer results with max OC in this card: cbuffer{float4} load uniform: 57.818ms 0.188x cbuffer{float4} load linear: 305.236ms 0.036x cbuffer{float4} load random: 115.551ms 0.094x

Dolkar commented 3 months ago

I see similar results on a 4070 Super. I also have another observation, though: If I remove the masking with a runtime constant like so:

//uint elemIdx = (htid + i) | loadConstants.elementsMask;
uint elemIdx = (htid + i);

I get the following results instead:

cbuffer{float4} load uniform: 2.533ms 3.950x
cbuffer{float4} load linear: 273.011ms 0.037x
cbuffer{float4} load random: 100.094ms 0.100x

whereas in other cases like for structured buffers the change doesn't seem to make much of a difference. It seems like for some reason Ada struggles with dynamically indexing into the constant buffer here, even if the index is uniform. But when the index ends up as a constant, it outperforms the other buffers again.