google / uVkCompute

A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Apache License 2.0
224 stars 38 forks source link

Add vector-times-matrix-transposed benchmark (V2) #40

Closed kuhar closed 11 months ago

kuhar commented 1 year ago

Based on https://github.com/google/uVkCompute/pull/38 by @qedawkins, and earlier mmt by @kuhar.

Add benchmarks for vmt, with very similar supporting structure to the existing mmt benchmark.

Changes compared to #38:

The performance depends heavily on the problem size. On 7900XTX, I'm seeing numbers up to 945 GB/s on 8k problem size.

kuhar commented 1 year ago

I added code to prefetch LHS and RHS in hope to hide latency. I'm seeing better numbers now:

-----------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------------
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x8]/Workgroup[64x1x1]/manual_time          53.5 us         12.5 us        11920 Bytes=314.242G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time         58.0 us         15.7 us        12337 Bytes=289.814G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x8]/Workgroup[64x1x1]/manual_time          60.7 us         16.4 us        11211 Bytes=276.524G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time         58.1 us         12.8 us        10386 Bytes=289.079G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x8]/Workgroup[64x1x1]/manual_time          56.1 us         14.0 us        12489 Bytes=299.303G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time         53.4 us         11.8 us        11683 Bytes=314.342G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x8]/Workgroup[64x2x1]/manual_time          64.4 us         13.4 us         9679 Bytes=260.648G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time         61.3 us         12.4 us         9736 Bytes=274.182G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x8]/Workgroup[64x2x1]/manual_time          67.9 us         16.9 us        10330 Bytes=247.387G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time         51.1 us         11.5 us        10830 Bytes=328.514G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x8]/Workgroup[64x4x1]/manual_time          79.2 us         19.5 us         9463 Bytes=212.14G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time         59.2 us         11.0 us         9671 Bytes=283.908G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x8]/Workgroup[64x1x1]/manual_time          76.9 us         14.8 us         7301 Bytes=873.317G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time         85.7 us         18.8 us         7442 Bytes=783.545G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x8]/Workgroup[64x1x1]/manual_time          77.7 us         10.6 us         7309 Bytes=864.229G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time         76.4 us         10.1 us         7115 Bytes=879.334G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x8]/Workgroup[64x1x1]/manual_time          80.4 us         16.7 us         6536 Bytes=835.583G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time         83.8 us         18.4 us         7601 Bytes=801.437G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x8]/Workgroup[64x2x1]/manual_time           102 us         10.8 us         6059 Bytes=657.937G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time          110 us         15.9 us         6062 Bytes=609.361G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x8]/Workgroup[64x2x1]/manual_time           103 us         17.6 us         6073 Bytes=651.854G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time          104 us         11.5 us         6162 Bytes=647.435G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x8]/Workgroup[64x4x1]/manual_time           146 us         11.9 us         4441 Bytes=459.739G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time          149 us         13.8 us         4380 Bytes=451.568G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x8]/Workgroup[64x1x1]/manual_time         358 us         11.2 us         1935 Bytes=751.019G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time        355 us         15.1 us         1958 Bytes=756.893G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x8]/Workgroup[64x1x1]/manual_time         361 us         11.4 us         1920 Bytes=744.092G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time        357 us         12.9 us         1948 Bytes=752.493G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x8]/Workgroup[64x1x1]/manual_time         372 us         12.4 us         1860 Bytes=722.532G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time        369 us         11.0 us         1879 Bytes=727.972G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x8]/Workgroup[64x2x1]/manual_time         431 us         15.9 us         1338 Bytes=622.646G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time        425 us         11.1 us         1337 Bytes=631.244G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x8]/Workgroup[64x2x1]/manual_time         415 us         10.7 us         1356 Bytes=647.791G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time        415 us         11.6 us         1358 Bytes=646.999G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x8]/Workgroup[64x4x1]/manual_time         596 us         11.2 us          960 Bytes=450.851G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time        595 us         10.9 us          967 Bytes=451.092G/s
kuhar commented 12 months ago

New numbers with increased load type:

Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time          54.3 us         12.0 us        10363 Bytes=309.157G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x32]/Workgroup[64x1x1]/manual_time          52.0 us         10.9 us        12351 Bytes=322.95G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x64]/Workgroup[64x1x1]/manual_time          50.8 us         11.1 us        12038 Bytes=330.542G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[1x128]/Workgroup[64x1x1]/manual_time         60.9 us         11.0 us        10596 Bytes=275.744G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time          52.2 us         12.1 us        12118 Bytes=321.931G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x32]/Workgroup[64x1x1]/manual_time          54.5 us         12.2 us        12197 Bytes=308.301G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x64]/Workgroup[64x1x1]/manual_time          54.3 us         12.5 us        12236 Bytes=309.169G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x128]/Workgroup[64x1x1]/manual_time         59.4 us         11.5 us        11041 Bytes=282.758G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time          52.9 us         12.4 us        12068 Bytes=317.288G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x32]/Workgroup[64x1x1]/manual_time          53.0 us         11.6 us        12137 Bytes=317.187G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x64]/Workgroup[64x1x1]/manual_time          60.0 us         17.5 us        12092 Bytes=279.866G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x128]/Workgroup[64x1x1]/manual_time         62.9 us         15.3 us        11022 Bytes=267.132G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time          58.7 us         10.5 us        10954 Bytes=286.225G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x32]/Workgroup[64x2x1]/manual_time          52.8 us         10.4 us        10776 Bytes=318.315G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x64]/Workgroup[64x2x1]/manual_time          53.0 us         10.6 us        10903 Bytes=316.809G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[2x128]/Workgroup[64x2x1]/manual_time         61.2 us         10.6 us         9103 Bytes=274.38G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time          63.3 us         15.6 us        10332 Bytes=265.485G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x32]/Workgroup[64x2x1]/manual_time          49.8 us         11.0 us        10735 Bytes=337.464G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x64]/Workgroup[64x2x1]/manual_time          55.1 us         10.4 us        11037 Bytes=305.028G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x128]/Workgroup[64x2x1]/manual_time         60.7 us         11.7 us         9199 Bytes=276.889G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time          60.1 us         11.1 us         9309 Bytes=279.383G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x32]/Workgroup[64x4x1]/manual_time          63.9 us         12.7 us         9296 Bytes=262.777G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x64]/Workgroup[64x4x1]/manual_time          68.9 us         15.1 us         9552 Bytes=243.732G/s
Radeon RX 7900 XTX/vmt[4096x4096]/i8->i32/Tile[4x128]/Workgroup[64x4x1]/manual_time         81.5 us         10.2 us         7080 Bytes=206.221G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time          84.9 us         18.5 us         7350 Bytes=790.845G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x32]/Workgroup[64x1x1]/manual_time          72.9 us         13.2 us         7557 Bytes=921.461G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x64]/Workgroup[64x1x1]/manual_time          72.6 us         11.4 us         7456 Bytes=925.018G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[1x128]/Workgroup[64x1x1]/manual_time         74.4 us         14.2 us         7521 Bytes=902.598G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time          75.9 us         10.1 us         7211 Bytes=885.057G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x32]/Workgroup[64x1x1]/manual_time          76.5 us         11.2 us         7392 Bytes=877.929G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x64]/Workgroup[64x1x1]/manual_time          78.3 us         14.9 us         7486 Bytes=857.515G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x128]/Workgroup[64x1x1]/manual_time         74.5 us         10.5 us         7314 Bytes=901.18G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time          73.1 us         10.9 us         7144 Bytes=918.353G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x32]/Workgroup[64x1x1]/manual_time          70.2 us         10.6 us         7656 Bytes=956.002G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x64]/Workgroup[64x1x1]/manual_time          75.3 us         10.2 us         7480 Bytes=892.07G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x128]/Workgroup[64x1x1]/manual_time         76.0 us         10.8 us         6602 Bytes=883.558G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time           105 us         17.0 us         5917 Bytes=636.596G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x32]/Workgroup[64x2x1]/manual_time          91.3 us         10.5 us         6277 Bytes=735.314G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x64]/Workgroup[64x2x1]/manual_time          92.7 us         10.2 us         6228 Bytes=724.199G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[2x128]/Workgroup[64x2x1]/manual_time          103 us         15.4 us         5939 Bytes=650.338G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time          96.8 us         10.6 us         5986 Bytes=693.685G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x32]/Workgroup[64x2x1]/manual_time          97.4 us         10.0 us         6361 Bytes=689.412G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x64]/Workgroup[64x2x1]/manual_time          93.8 us         10.4 us         6072 Bytes=716.008G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x128]/Workgroup[64x2x1]/manual_time         94.3 us         10.5 us         5966 Bytes=712.163G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time           143 us         10.2 us         4475 Bytes=468.694G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x32]/Workgroup[64x4x1]/manual_time           143 us         10.4 us         4431 Bytes=469.864G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x64]/Workgroup[64x4x1]/manual_time           148 us         11.3 us         4247 Bytes=453.095G/s
Radeon RX 7900 XTX/vmt[8192x8192]/i8->i32/Tile[4x128]/Workgroup[64x4x1]/manual_time          147 us         10.3 us         4150 Bytes=457.114G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x16]/Workgroup[64x1x1]/manual_time         351 us         10.2 us         1968 Bytes=764.448G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x32]/Workgroup[64x1x1]/manual_time         342 us         11.0 us         2047 Bytes=785.825G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x64]/Workgroup[64x1x1]/manual_time         340 us         10.9 us         2022 Bytes=790.284G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[1x128]/Workgroup[64x1x1]/manual_time        342 us         14.7 us         2001 Bytes=785.858G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x16]/Workgroup[64x1x1]/manual_time         352 us         11.2 us         1952 Bytes=762.24G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x32]/Workgroup[64x1x1]/manual_time         344 us         10.9 us         2010 Bytes=781.62G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x64]/Workgroup[64x1x1]/manual_time         345 us         11.7 us         2020 Bytes=777.404G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x128]/Workgroup[64x1x1]/manual_time        343 us         11.7 us         2026 Bytes=782.997G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x1x1]/manual_time         367 us         11.8 us         1851 Bytes=732.426G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x32]/Workgroup[64x1x1]/manual_time         357 us         10.4 us         1926 Bytes=751.48G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x64]/Workgroup[64x1x1]/manual_time         348 us         11.9 us         1985 Bytes=772.219G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x128]/Workgroup[64x1x1]/manual_time        346 us         11.0 us         2012 Bytes=776.623G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x16]/Workgroup[64x2x1]/manual_time         423 us         10.9 us         1304 Bytes=634.515G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x32]/Workgroup[64x2x1]/manual_time         416 us         10.4 us         1337 Bytes=645.222G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x64]/Workgroup[64x2x1]/manual_time         419 us         10.5 us         1225 Bytes=640.945G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[2x128]/Workgroup[64x2x1]/manual_time        425 us         10.4 us         1262 Bytes=631.165G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x2x1]/manual_time         415 us         12.5 us         1276 Bytes=646.296G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x32]/Workgroup[64x2x1]/manual_time         407 us         10.4 us         1292 Bytes=659.612G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x64]/Workgroup[64x2x1]/manual_time         418 us         10.4 us         1285 Bytes=642.076G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x128]/Workgroup[64x2x1]/manual_time        427 us         10.9 us         1336 Bytes=629.435G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x16]/Workgroup[64x4x1]/manual_time         589 us         12.8 us          883 Bytes=456.136G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x32]/Workgroup[64x4x1]/manual_time         589 us         11.2 us          881 Bytes=455.663G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x64]/Workgroup[64x4x1]/manual_time         608 us         12.1 us          886 Bytes=441.749G/s
Radeon RX 7900 XTX/vmt[16384x16384]/i8->i32/Tile[4x128]/Workgroup[64x4x1]/manual_time        628 us         11.5 us          916 Bytes=427.491G/s
kuhar commented 11 months ago

@antiagainst @qedawkins I'm pretty happy with this implementation. Should we merge?

qedawkins commented 11 months ago

@antiagainst @qedawkins I'm pretty happy with this implementation. Should we merge?

Works for me, can I give it a pass tomorrow first?

oscarbg commented 11 months ago

Hi, sorry to ask here.. but what's special about RDNA3 in this test, as I can't run this sample on Nvidia 4070:

~/code/uVkCompute/build/benchmarks/vmt ./vmt_rdna3 2023-11-07T17:08:45+01:00 Running ./vmt_rdna3 Run on (32 X 5881 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 1024 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 8.08, 5.68, 2.31 WARNING CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. WARNING Library was built as DEBUG. Timings may be affected. code/uVkCompute/benchmarks/vmt/vmt_main.cc:123: check error: destination buffer element (0) has incorrect value: expected to be 1404 but found -1 ^ In shader: Tile[1x16], i8->i32 Abortado (`core' generado)

kuhar commented 11 months ago

@oscarbg noting as of today, you can see the GSL compile target here: https://github.com/google/uVkCompute/commit/3049af9a233ab6d49088f2c99e2623f0c2b5be04#diff-62da6f62b4091626b341c9d8333d332aee35c053ff57cacebbb57792b987702aR30

This is more to communicate that it has been tuned and tested on rdna3, and in the future we may add more target-specific options to GLSL.

kuhar commented 11 months ago

code/uVkCompute/benchmarks/vmt/vmt_main.cc:123: check error: destination buffer element (0) has incorrect value: expected to be 1404 but found -1 ^ In shader: Tile[1x16], i8->i32 Abortado (`core' generado)

@oscarbg also this indicates that one of the assumptions made in the GLSL does not hold on this target.