Halve uniforms used by PointLights

benaadams commented 9 years ago

The uniforms used by point lights could be halved by combining parts:

"uniform vec3 pointLightColor[ MAX_POINT_LIGHTS ];",
"uniform vec3 pointLightPosition[ MAX_POINT_LIGHTS ];",
"uniform float pointLightDistance[ MAX_POINT_LIGHTS ];",
"uniform float pointLightDecay[ MAX_POINT_LIGHTS ];",

If color and position were changed to vec4 the distance and decay floats could be moved into the w component which would halve the number of uniforms used by it (at a cost of making the code less readable)

Continuing from #7028

This is unrelated, but another issue we run into constantly on Clara.io is blowing out the space for the light uniforms.

tschw commented 9 years ago

I like this suggestion. Fragment shader limits are very low on some targets. See MAX_FRAGMENT_UNIFORM_VECTORS of smartphones at WebGL Stats.

I think the implementing code should include some comments, so it's not too difficult to follow what's going on. While we're at it, we may also feed reciprocal lengths to save the division calculating the light attenuation.

gero3 commented 9 years ago

Do we really need this?? As far as I can tell from the webgl specification, we shouldn't need to provide this as there is a section about uniform packing in the spec. I do not know tough if all implementation follow this.

https://www.khronos.org/registry/webgl/specs/latest/1.0/#6.24 states:

The OpenGL ES Shading Language, Version 1.00 [GLES20GLSL], Appendix A, Section 7 "Counting of Varyings and Uniforms" defines a conservative algorithm for computing the storage required for all of the uniform and varying variables in a shader. The GLSL ES specification requires that if the packing algorithm defined in Appendix A succeeds, then the shader must succeed compilation on the target platform. The WebGL API further requires that if the packing algorithm fails either for the uniform variables of a shader or for the varying variables of a program, compilation or linking must fail.

Instead of using a fixed size grid of registers, the number of rows in the target architecture is determined in the following ways:

when counting uniform variables in a vertex shader: getParameter(MAX_VERTEX_UNIFORM_VECTORS)

when counting uniform variables in a fragment shader: getParameter(MAX_FRAGMENT_UNIFORM_VECTORS)

when counting varying variables: getParameter(MAX_VARYING_VECTORS)

https://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf Appendix A section 7 states:

The resource allocation of variables must succeed for all cases where the following packing algorithm succeeds:

• The target architecture consists of a grid of registers, 8 rows by 4 columns for varying variables and 128 rows by 4 columns for uniform variables. Each register can contain a float value.

• Variables are packed into the registers one at a time so that they each occupy a contiguous subrectangle. No splitting of variables is permitted.

• The orientation of variables is fixed. Vectors always occupy registers in a single row. Elements of an array must be in different rows. E.g. vec4 will always occupy one row; float[8] will occupy one column. Since it is not permitted to split a variable, large arrays e.g.. for varyings, float[16] will always fail with this algorithm.

• Variables consume only the minimum space required with the exception that mat2 occupies 2 complete rows. This is to allow implementations more flexibility in how variables are stored.

• Arrays of size N are assumed to take N times the size of the base type.

• Variables are packed in the following order:

Arrays of mat4 and mat4

Arrays of mat2 and mat2 (since they occupy full rows)

Arrays of vec4 and vec4

Arrays of mat3 and mat3

Arrays of vec3 and vec3

Arrays of vec2 and vec2

Arrays of float and float

• For each of the above types, the arrays are processed in order of size, largest first. Arrays of size 1 and the base type are considered equivalent. In the case of varyings, the first type to be packed (successfully) is mat4[2] followed by mat4, mat2[2], mat2, vec4[8], ve4[7],...vec4[1], vec4, mat3[2], mat3 and so on. The last variables to be packed will be float (and float[1]).

• For 2,3 and 4 component variables packing is started using the 1st column of the 1st row. Variables are then allocated to successive rows, aligning them to the 1st column.

• For 2 component variables, when there are no spare rows, the strategy is switched to using the highest numbered row and the lowest numbered column where the variable will fit. (In practice, this means they will be aligned to the x or z component.) Packing of any further 3 or 4 component variables will fail at this point.

• 1 component variables (i.e. floats and arrays of floats) have their own packing rule. They are packed in order of size, largest first. Each variable is placed in the column that leaves the least amount of space in the column and aligned to the lowest available rows within that column. During this phase of packing, space will be available in up to 4 columns. The space within each column is always contiguous.

• If at any time the packing of a variable fails, the compiler or linker must report an error.

I've also checked if the webgl conformance tests test this but they don't test packing different uniforms into one row (only arrays). I'll create an issue here later today for that.

benaadams commented 9 years ago

I've also checked if the webgl conformance tests test this but they don't test packing different uniforms into one row

So something like:?

shader with uniform array of vec3 with N elements (maximum) and uniform array of float with N elements (maximum) should succeed

gero3 commented 9 years ago

So something like:?

shader with uniform array of vec3 with N elements (maximum) and uniform array of float with N elements (maximum) should succeed

yes, That is what I mean.

gero3 commented 9 years ago

I've added a test in webgl conformance that tests uniform packing in KhronosGroup/WebGL#1167, which seem to fail for certain devices according to @kenrussell.

This doesn't solve the underlining problem of having too little uniform anyways. We should add an option to allow textures for lights mapping too becuase we never use the 8 samplers that are always present in fragment shaders according to webglstats.

tschw commented 9 years ago

I've also checked if the webgl conformance tests test this but they don't test packing different uniforms into one row

...

I've added a test in webgl conformance that tests uniform packing in KhronosGroup/WebGL#1167, which seem to fail

Wow! Thanks for digging into it. :+1:

We should pack these uniforms, given that broken automatic packing will be out there for another while.

We should add an option to allow textures for lights mapping too becuase we never use the 8 samplers that are always present in fragment shaders according to webglstats.

Yes. See #7028. Let's also /ping @bhouston on this thread.

bhouston commented 9 years ago

My only suggestion is that we wrap the accessors to light parameters in functions -- they should inline anyhow. This way we can continue to adjust how these parameters are stored (not packed, packed, data texture, etc.) without changing any core shader code. I'd suggest something like:

float getLightPointDecay( int pointLightIndex );
float3 getLightPointPosition( int pointLightIndex );
...
float3 getLightSpotDirection( int spotLightIndex );

Would that be possible? We could even make then defines as I know that @benaadams doesn't like functions.

bhouston commented 9 years ago

My preference is data textures unless there is a performance issue I do not know about -- with a fall back to packed uniforms for older devices. If we use the function/macro approach combined with some defines in WebGLRenderer similar to how we enable/disable data textures in the bones code, this should be relatively straight forward to do.

The reason I support data textures is that theoretically then light counts are unlimited and this works well with editors like Clara.io where a user can easily add as many lights as they want.

bhouston commented 9 years ago

@benaadams asked in the other thread:

@bhouston what kind of light type distribution are you seeing? e.g. is it mostly point lights?

We see all kinds of light combinations, in part because people can render in V-Ray, so they can use 100 lights in V-Ray if they wanted to. Of course if you use the distance cutoff feature properly, this can be pretty efficient still.

I wonder what type of data texture organization would make sense?

I think that one could make the height a multiple of the number of lights, and the width the number of pixels required for the maximum number of parameters required by a light type. Thus one texture for all lights, organized as one light per row, probably grouped by light type. Thus each light type would have a starting offset into this texture and then you increment for each light of that type.

Something like:


float2 lightDataUVIncrement = float2( 1.0 /  LIGHT_DATA_TEXTURE_WIDTH, 1.0 / LIGHT_DATA_TEXTURE_HEIGHT );

float3 getPointLightPosition( int pointLightIndex ) {
  float2 lightUV = float2( POINT_LIGHTS_OFFSET + pointLightIndex, 0.0 ) * lightDataUVIncrement;
  return texture2D( lightDataTexture, lightUV ).xyz;
}

float getPointLightDistance( int pointLightIndex ) {
  float2 lightUV = float2( POINT_LIGHTS_OFFSET + pointLightIndex, 0.0 ) * lightDataUVIncrement;
  return texture2D( lightDataTexture, lightUV ).w;
}

float getPointLightDecay( int pointLightIndex ) {
  float2 lightUV = float2( POINT_LIGHTS_OFFSET + pointLightIndex, 1.0 ) * lightDataUVIncrement;
  return texture2D( lightDataTexture, lightUV ).x;
}

Although maybe by returning a struct one could simplify the accesses? Something like this (I am just guessing on struct syntax, so it is likely wrong):

float2 lightDataVIncrement = float2( 0.0, 1.0 / LIGHT_DATA_TEXTURE_HEIGHT );

PointLight getPointLight( int pointLightIndex ) {
  float2 lightUV = float2( POINT_LIGHTS_OFFSET + pointLightIndex, 0.0 ) * lightDataUVIncrement;
  PointLight pointLight;
  float4 data0 = texture2D( lightDataTexture, lightUV );
  float4 data1 = texture2D( lightDataTexture, lightUV + lightDataVIncrement );
  pointLight.position = data0.xyz;
  pointLight.distance = data0.w;
  pointLight.decay = data1.w;
  return pointLight;
}

gero3 commented 9 years ago

I made an example of how we could create a lightTexture in three.js in #7060

tschw commented 9 years ago

My preference is data textures unless there is a performance issue I do not know about

I don't know what you know, of course. But I'll share what I know:

Uniforms are typically implemented as in-core memory. There is a fair chance that texture units provide less bandwidth. It is reported explicitly in a paper on GPGPU raytracing for NVIDIA Fermi architectures, but could very well hold for others, especially when it comes to floatingpoint values. The same paper states that texture fetches have a high latency; there's a stall when it can't be hidden by overlapped execution.
Floatingpoint textures are not available on 25% of the WebGL-enabled phones out there. Working around it at least doubles the number of fetches, plus needs dependent arithmetic to re-assemble the values.
There is a potential overhead for effective address calculation, also adding a (possibly narrow) data dependency. Other than in desktop GL, we don't have a texelFetch command for direct addressing. Depending on how sampling is implemented, there can be additional, hidden instructions. Unrolling loops and folding constants may optimize some of it out, but reportedly drivers of mobile GPUs optimize poorly. It was the motivation for Unity3D to use that Mesa GLSL compiler to pre-optimize the shaders.
To consistently put lighting info in textues, we'd also have to read them in the vertex shader for the Gouraud shading performed for MeshLambertMaterial. It's not available on 15% of the WebGL-enabled phones. These 15% may only partially overlap the 25% mentioned before and we may end up with a device coverage below 75%.

=> Textures can't be the only way to feed the lighting info.

bhouston commented 9 years ago

This is amazing - trying it out now.

I am still of the opinion that we have to hide the decoding as functions though -- those are scarily huge blocks of code to put into all of the shaders.

Super off topic: it would be so cool if every PR was automatically build and deployed on a temp website, like http://testing.threejs.org/pr7070/examples/ with some strict no crawl robots.txt. Then I wouldn't have to do the standard, checkout, build, run webserver, test loop -- and neither would others.

bhouston commented 9 years ago

=> Textures can't be the only way to feed the lighting info.

I agree.

mrdoob commented 9 years ago

I've added a test in webgl conformance that tests uniform packing in KhronosGroup/WebGL#1167, which seem to fail for certain devices according to @kenrussell.

Many thanks for doing that!

Then I wouldn't have to do the standard, checkout, build, run webserver, test loop -- and neither would others.

What I usually do is downloading the zip, that speeds things up a bit... https://github.com/gero3/three.js/archive/lightTexture.zip

mrdoob commented 9 years ago

Also, @tschw many thanks for sharing what you know. It's super helpful for me!

kenrussell commented 9 years ago

Brief follow-up: please see https://github.com/KhronosGroup/WebGL/pull/1167 for more information on the investigation. We're finding that many desktop OpenGL implementations expand out arrays of scalar values into arrays of 4-vectors. Unfortunately this means that shader authors will have to work around this limitation -- transforming the shaders to try to work around this is infeasible.

tschw commented 9 years ago

many thanks for sharing what you know. It's super helpful for me!

You are welcome :blush:. Don't overrate the stuff on GPU performance, though: It just depends on too many factors in the end (e.g. using a data texture to let the GPU process a large amount of work in one piece can easily turn out a huge win, also, in this particular case, the caches should be on our side since there's just little data and it allows us to overcome limits where the shader otherwise can't run at all).

To state it most clearly: I support uniform compaction, lighting info in textures, and suggested abstraction via functions / macros.

gero3 commented 9 years ago

Do we really need this?? As far as I can tell from the webgl specification, we shouldn't need to provide this as there is a section about uniform packing in the spec. I do not know tough if all implementation follow this.

We really do need this, as stated in https://github.com/KhronosGroup/WebGL/pull/1167#issuecomment-136465010.

mrdoob commented 9 years ago

Is anyone planning on giving a go at this?

tschw commented 9 years ago

It was Ben's call, but I'd pick it up in case he doesn't want to implement it himself.

/ping @benaadams

There's also @gero3 's #7060 (which is potentially conflicting) and @bhouston suggested to unify both approaches with functions or macros...

WestLangley commented 9 years ago

IMHO, there is something that is more critical (and a nice weekend project for someone who has the skills to do it) -- getting the shader-specific code out of the renderer.

The benefit: we can start making progress adding new materials.

tschw commented 9 years ago

@WestLangley I'll eventually need this one implemented to please my client, but thanks for reminding me about the importance of

getting the shader-specific code out of the renderer.

Got it in the pipeline, but I have to allocate some time in one piece to finish it.

Oletus commented 4 years ago

I don't think this issue is current anymore - the uniforms are stored in an array of structs now, and at least on most platforms the uniform packing happens correctly with those. Would be useful to know if there's some specific platform that doesn't pack the uniforms right though, then the proposed change could make a difference.

So maybe close this?

mrdoob / three.js

Halve uniforms used by PointLights #7037