mrdoob / three.js

JavaScript 3D Library.
https://threejs.org/
MIT License
102.53k stars 35.37k forks source link

SkinnedMesh bug when far from scene root (0,0,0) #13288

Closed qornflex closed 5 years ago

qornflex commented 6 years ago

I think I found a bug about your SkinnedMesh (tested on iPhone 6 & 8).

If this is already reported, I'm sorry, I didn't find it in the issue list :(

It seems the gpu skinning is not working correctly and getting crazy on mobile. When a SkinnedMesh is moving or moved at high value positions (ex: x:0, y:0, z:1000), the skinning is not accurate anymore and starts spider dance.

The scale of the mesh is affecting the bug. Bigger the scale is, lesser the bug.

It seems the skeleton bones values are not calculated correctly at each frame and the bonesTexture/bonesMatrix on the skinning shader is pushing vertices at wrong place. This is just my feeling of course.

I ran many tests before posting this... looking for a clue in my animated exports but I found the bug is happening with any kind of formats (GLTF, FBX, Collada, JSON, ...) and models from ThreeJS repo.

That's very annoying because that means we are unable to develop a simple runner game with an avatar running (avatar.position.z increasing then) without having this issue :(( I still don't know how I'll manage it as morphTargets is not an option :(

Hope you guys can help here.

I made clear examples with clean source to expose the problem. It's quite easy to verify it on a smart phone:

Appearing only on mobile (z=10000): http://cornflex.org/webgl/skinning-bug.html

With floatVertexTextures set to false (z=10000): http://cornflex.org/webgl/skinning-bug2.html

Getting worse with distance (z increasing): http://cornflex.org/webgl/skinning-bug3.html

Very very far from center (z=70000000) > bug also appearing on desktop but certainly due to float precision issue: http://cornflex.org/webgl/skinning-bug4.html

Video Preview in my game environment: This is a realistic scale world (1.0m = 1.0 threejs unit). Bug is appearing only after 50-60m from scene root and getting worse with distance: http://cornflex.org/webgl/skin-bug.mp4

VERY IMPORTANT The mesh used from the ThreeJS repo is way too big. It's like 20m tall. That's why the z value has to be bigger to see the bug. If this mesh is scaled down at realistic size, then the bug starts to appear even at 100m.

Three.js version
Browser
OS
Hardware Requirements (graphics card, VR Device, ...)

iPhone 6, 8

Mugen87 commented 6 years ago

Um, your demo looks good on my Pixel.

donmccurdy commented 6 years ago

Confirmed issues on my iPhone SE β€”

img_6324

WebGL precision issues in iOS perhaps

Mugen87 commented 6 years ago

But the iOS devices actually use the float texture based code path of the skinning implementation, right? Can you verify this with the following conformance test?

https://www.khronos.org/registry/webgl/conformance-suites/1.0.3/conformance/extensions/oes-texture-float.html

donmccurdy commented 6 years ago

iPhone SE has five failures on that page. πŸ˜•

qornflex commented 6 years ago

I made a small screencast to show you in my project: http://cornflex.org/webgl/skin-bug.mp4

If the avatar goes too far from 0,0,0, the skin starts to get really buggy as you can see. I made a deep profiling of the skeleton and bones. All seems fine except the geometry is "pulled back" in some way... I though about bad animation root but again, all seems fine in Threejs with that...

The prob is clearly linked to the Skeleton and the bonesTexture of the Skinning shader. I also though about precision but as Mugen said, iOS device should support that with ease.

... why only on iphone then :/

I'm quite stuck... I'm reorienting to png sequences as backup plan ... :(

titansoftime commented 6 years ago

This is most likely due to the iphone 6 not supporting enough bones.

I had the exact same problem.

Check your bone limit here: https://virtulo.us/2/vr/test

qornflex commented 6 years ago

I don't think so. If the skinnedmesh stays at 0,0,0, there is no prob at all :/ I'm even thinking about moving all the assets instead of the avatar... (this is my workaround #2) but I'd surely prefer to have the skeleton working as good as desktop of course.

A friend just reported the same bug on iPhone 8

Mugen87 commented 6 years ago

What results do you get with the following conformance test?

https://www.khronos.org/registry/webgl/conformance-suites/1.0.1/conformance/misc/shader-precision-format.html

qornflex commented 6 years ago

After looking at the ThreeJS sources for 2 days... I think there is a prob when the bonesTexture is updated. It looks like this texture doesn't udpate correctly and then push back the vertices to original positions... but I could be wrong of course.

qornflex commented 6 years ago

img_1411

qornflex commented 6 years ago

Hmmm... here I have some FAILED, look:

img_1412

Mugen87 commented 6 years ago

Maybe the problem is related to these fails. My Pixel for instance passes all tests.

Let's try something out: Go to WebGLCapabilities and set the value of floatVertexTextures to false (this will prevent the usage of the float texture). Make a build and use this three.js version in your app. I'm curious what happens 😊.

https://github.com/mrdoob/three.js/blob/099e4364541502fdf22fc7b1c0f54239d2ba1708/src/renderers/webgl/WebGLCapabilities.js#L105

qornflex commented 6 years ago

Already tried floatVertexTextures = false on the renderer. The skin stops going crazy but there is no animation anymore :/ I'll push a sample.

qornflex commented 6 years ago

Here you are: http://cornflex.org/webgl/skinning-bug2.html

renderer.capabilities.floatVertexTextures = false;

=> No animation running anymore

img_1413

Mugen87 commented 6 years ago

Yeah, same on my Pixel (Desktop works). I guess we are hitting a uniform limit.

qornflex commented 6 years ago

I made a last sample, with bug appearing progressively: http://cornflex.org/webgl/skinning-bug3.html

Starts from 0,0,0 and goes forward... wait a few seconds... skin starts getting crazy.

It's quite weird all is OK at 0,0,0. isn't it?

qornflex commented 6 years ago

z: 1255 > still OK z: 18870 > crazy skin

img_1415

img_1414

qornflex commented 6 years ago

As I said, I deep profiled all the skeleton and the bones positions on CPU side. All values seem to be quite OK. The only clue I have is the bonesTextures and bonesMatrixes not working correctly on iOS when the mesh is getting far from 0,0,0.

I checked the bones positions because it's a typical behavior of steady bones of the root node or things like that.... but no...

I checked about float precision on GPU side in your skin shaders but again... nothing seems to have an impact.

Very strange nobody moved a SkinnedMesh inside a scene and never reported this :/

Mugen87 commented 6 years ago

One more question: Do you have the same problem with Chrome?

qornflex commented 6 years ago

Same prob yeah.

img_1416

mrdoob commented 6 years ago

Sounds like a precision issue. Have you tried scaling the scene down? (11195 is 11 kilometers).

Another option would be to move the scene instead of the character. Tends to be a common solution for big scenes.

mrdoob commented 6 years ago

Also, I guess the skinning code is in world space. We could investigate this.

titansoftime commented 6 years ago

I see your bone count is 67.

My (wifes) iphone 6 has a Max Hardware Bones limit of 27.

The deformations I see are identical to those I saw on her phone in my game (and the same problem I had before I had my animator fix the meshes armitures). I'd be surprised if this wasn't at least part of your issue.

Why it gets worse over time? No idea. Someone else mentioned something similar, not sure if it's related: https://discourse.threejs.org/t/updated-with-fiddle-as-animations-go-faster-there-is-increasing-error-in-accuracy/1707/6

Another annoying thing I found on iphones (iphone 6 at least): I MUST use highp shader precision, mediump and lowp don't distort the mesh in that same scary looking fashion (as my bone issue on the phone), but the textures look weird and garbled/pixelated while animating.

What is your hardware bone limit on the device you are testing?

qornflex commented 6 years ago

@mrdoob I already tried to scale down but the skinning goes crazy much faster in fact :/ If you scale down at 0.1, then the bug is appearing 10 times faster.

The demo I used here to explain comes from the threejs examples. In my game, the scale is not that big but the bug is the same, only appearing faster.

Indeed, The skinning seems to be done in worldspace, or localspace but getting stretch when moved far from center. That's why it's running quite well at 0,0,0

Moving all the world instead of the avatar is my workaroung #2. You must admit it's a bad choice when you developed all the routine of a runner game in worldspace already with physics and all. And z=11000 is not that far for a runner game :/ ... and it's just a running man, not a ship.

I made plenty of games with ThreeJS but it's the first time I use a skinnedmesh. This iOS bug is really annoying.

Workaroung #1: I'll do 6 png sequences as spritesheet animations. Then I can keep a worldspace logic. Deadline is not far :(

@titansoftime The prob is appearing on the iPhone 8 of my wife too. This is not related to iPhone 6 only. Also, if you have a max bones problem, then it will be buggy from start. Here, you can see it's not buggy at 0,0,0. Then the bones limit is not the prob. Worldspace calculation of the bones and the bonesMatrix in skinning shaders is, IMO.

titansoftime commented 6 years ago

What is the Max Hardware Bones of the iPhone 8?

qornflex commented 6 years ago

I do not know. And I'm not sure it's device dependant.

But again... IMO, it's not related to maxbones value at all.

The sample is working correctly on iphone 6,8,X ... even 5... only when the SkinnedMesh stays at 0,0,0 https://threejs.org/examples/webgl_loader_fbx.html

If maxbones was a problem, we would not have a good skinning at 0,0,0 neither.

Mugen87 commented 6 years ago

Related:

https://www.opengl.org/discussion_boards/archive/index.php/t-159613.html https://www.opengl.org/discussion_boards/archive/index.php/t-159364.html

Mugen87 commented 6 years ago

I'd like to highlight this section of the user zeoverlord:

The jittering thing is only natural for floats, variables will loose precision the higher the number is, so precision is the best the closer to 1.0 it is. Now this is not really a problem but it does mean that if you are doing a lot of matrix calculations with huge numbers you are going to have a lot of precision degradation(aka jittering). So you need to do all of your math around the origin with reasonable numbers and then translate it away

qornflex commented 6 years ago

Mmmh, very interesting @Mugen87! It sounds you spotted the leak.

RemusMar commented 6 years ago

z: 1255 > still OK z: 18870 > crazy skin

The error might be on your side. A common mistake: the skin modifier was not properly applied. Error 1: vertices controlled by too many bones Error 2: vertices without weight (or with a very small weight).

When the SkinnedMesh is close to origin (0,0,0) you won't notice these errors. But when is moved far away ... you will blame the mobile device precision. So the first thing to do is to redesign (properly this time) the skin modifier.

qornflex commented 6 years ago

@RemusMar Did you look at the source of the page ? The example I exposed came right from the ThreeJS sources.

Quite well mentioned in my description ;)

I ran many tests before posting this... looking for a clue in my animated exports but I found the bug is happening with any kind of formats (GLTF, FBX, Collada, JSON, ...)

I made a clear example with clean source to expose the problem. It's quite easy to verify it on a smart phone:

http://cornflex.org/webgl/skinning-bug.html

RemusMar commented 6 years ago

The example I exposed came right from the ThreeJS sources.

So what?

qornflex commented 6 years ago

How the error can be on my side when this happening with all the skinnedmesh examples from ThreeJS :)

You said:

The error might be on your side. A common mistake: the skin modifier was not properly applied. Error 1: vertices controlled by too many bones Error 2: vertices without weight (or with a very small weight).

qornflex commented 6 years ago

FBX is from ThreeJS... and you can try with GLTF or any other format. The prob is the same.

var loader = new THREE.FBXLoader(); loader.load( 'https://threejs.org/examples/models/fbx/xsi_man_skinning.fbx', OnFBXLoaded);

@Mugen87 has found a clue. Let's hope the guy who made the gpu skinned mesh can fix it.

RemusMar commented 6 years ago

How the error can be on my side when this happening with all the skinnedmesh examples from ThreeJS

I said "The error might be on your side.". The bottom line: Use a simple SkinnedMesh with properly applied skin modifier:

See if you can reproduce the bug.

qornflex commented 6 years ago

Are you mad? :) This bug cannot be on my side... the models (fbx, gltf, dae, whatever) and code source come from ThreeJS.

Please read the @Mugen87 posts.

RemusMar commented 6 years ago

Use a simple SkinnedMesh with properly applied skin modifier:

See if you can reproduce the bug.

I have to go now. cheers

qornflex commented 6 years ago

Hmmm. You know what guys? This is happening on desktop too :(

As the ThreeJS example mesh is huge (100m tall), I placed it very very far: z=10000000 And look, same prob: http://cornflex.org/webgl/skinning-bug4.html

It's definitively something related to worldspace bones positions and bonesMatrix calculations, as exposed by @Mugen87 :

https://www.opengl.org/discussion_boards/archive/index.php/t-159613.html https://www.opengl.org/discussion_boards/archive/index.php/t-159364.html

As explained before to @mrdoob , I already tried to downscale the mesh but the prob is even worse... even on desktop and you encounter the bug not that far from 0,0,0 then.

In my own game (with realistic scales 1:1), the bug is already appearing at z:100.

RemusMar commented 6 years ago

You continue to use bad designed skinned meshes ... Anyway, to close this subject, here is the 1 to 1,000,000 (one million !!!) case study: http://necromanthus.com/Test/html5/Lara_1000000.html Click on the stage to switch between Y=0 and Y=1000000 A minor global lighting change is available per switch. So if the skinned mesh is properly designed, the result is (close to) perfect. cheers

mrdoob commented 6 years ago

We'll look into it after this month's release.

qornflex commented 6 years ago

@mrdoob Thank you.

@RemusMar You continue to ignore what it was said here. Please read the entire thread again :)

RemusMar commented 6 years ago

I just gave you a working example but you still don't understand. http://necromanthus.com/Test/html5/Lara_1000000.html

And you continue to talk about bad practices. Please keep in mind the following: The single-precision floating point number of digits is 7. That covers 1234,567 and 123456.7 Most of the weight maps come with 4 decimals, but even a single one provides decent results. For this reason all the 3D engines recommed a max value of 10,000 for world size. From 0000.001 to 9999.999 We all want "infinite" or huge 3D worlds, but that's not possible. What are you doing is completely wrong from many poins of view: game design, performances, animations and collisions. I hope you got the main picture now.

Mugen87 commented 6 years ago

Maybe related: http://davenewson.com/posts/2013/unity-coordinates-and-scales.html

qornflex commented 6 years ago

@RemusMar I understood quite well what you said but unfortunately your model has same problem... in lesser degree but still there: http://cornflex.org/webgl/skin-bug2.mp4

Source: http://cornflex.org/webgl/skinning-bug5.html

And if I give a position higher than z=60000, it's getting worse... and up to 100000 it's completely invisible.

We all want "infinite" or huge 3D worlds, but that's not possible.

LOL, I don't want infinite huge world... I just want to make a runner game, like I did many times before with ThreeJS... but this time I need a skinnedmesh as avatar.

My scene: http://cornflex.org/webgl/skin-bug.mp4

If we are unable to animate an avatar running over several meters ... then what's the point?? (in realistic scale 1m = 1 threejs unit, the bug already appears from 100).

BTW, your model is like 150m tall... this is not realistic scale. EDIT: Here the same scene with scale(0.01, 0.01, 0.01) on your model, to get it realistic sized (around 1.5m then). As you can see, it's already bugging at z=300 on mobile: http://cornflex.org/webgl/skin-bug3.mp4

What are you doing is completely wrong from many poins of view: game design, performances, animations and collisions.

LOL LOL... I suggest to visit my websites and blog about making games (http://quentinlengele.com, http://cornflex.org, http://www.ddrsa.com, http://br9732.com)

EDIT: your scene is not visible on iOS, empty screen (http://necromanthus.com/Test/html5/Lara_1000000.html). img_1418

makc commented 6 years ago

The skinning seems to be done in worldspace, or localspace but getting stretch when moved far from center. That's why it's running quite well at 0,0,0

Let's see:

vec3 transformed = vec3( position );
...
#ifdef USE_SKINNING
    vec4 skinVertex = bindMatrix * vec4( transformed, 1.0 );
    vec4 skinned = vec4( 0.0 );
    skinned += boneMatX * skinVertex * skinWeight.x;
    skinned += boneMatY * skinVertex * skinWeight.y;
    skinned += boneMatZ * skinVertex * skinWeight.z;
    skinned += boneMatW * skinVertex * skinWeight.w;
    transformed = ( bindMatrixInverse * skinned ).xyz;
#endif

does not look like world space to me? in regular mesh, transformed is set to position attribute - that is, local coords.

makc commented 6 years ago

so whatever happens does happen on js side, where bind and bone matrices are calculated. perhaps bind matrix does a lot of stretch, and bone matrices could be doctored to operate in other coordinates that do not require that much stretch in bind matrix... or something like that

makc commented 6 years ago

(edit: nvm, I think I've got this one wrong)

me

qornflex commented 6 years ago

@makc Unfortunately, I don't see enough revalant data in your copy/paste of the skinning shader to tell if the calculation are done in local or world space coords.

As I said, there is surely something wrong with the bones matrices calculation or maybe the order of these matrices calculation is not right. As you can see on the OpenGL forum and as @Mugen87 mentioned, this is the most revalant clue, at least to me:

The jittering thing is only natural for floats, variables will loose precision the higher the number is, so precision is the best the closer to 1.0 it is. Now this is not really a problem but it does mean that if you are doing a lot of matrix calculations with huge numbers you are going to have a lot of precision degradation(aka jittering). So you need to do all of your math around the origin with reasonable numbers and then translate it away

Again, as explained, the scale of the object is affecting the bug. The example with ThreeJS sources presents that bug only at z=100000 but it's because the size of the mesh (coming also from ThreeJS source) is huge. The running boy is 150m tall.

When you are doing a game, you don't use that kind of scale of course. You always try to keep your world with 1m = 1unit. This is even mandatory to get good physics and behavior.

The prob here is when you use a SkinnedMesh with realistic size: the bug is already appearing at z=300.

In any engine, you can drop a SkinnedMesh at put it at x:30000, y:60000, z:140000000 and there is no prob with bones and skinning. No jittering.

Surely I understand quite well we are working on a Javascript core here... but still.

Trolls here pretending "we all want infinite worlds but it's impossible" need to get back to school and learn matrix transformations. Or better looking at open world games and ask them self how their avatar can be "skinnedmeshed" sooo far from scene root 0,0,0 ......

makc commented 6 years ago

I have to say, this bug is fun. Here, I could reproduce it on the desktop, but 60K were not enough to run out of precision: needs more zeroes

makc commented 6 years ago

ok, so at z = 1e8, here is what we have with current shader:

screen shot 2018-02-14 at 1 15 56

and here is what we have if we change the shader to this (slight variation of my off-the-top-of-my-head test above that I edited out):

#ifdef USE_SKINNING
    vec4 skinVertex = bindMatrix * vec4( transformed, 1.0 );
    vec4 skinned = vec4( 0.0 );
    skinned += ( bindMatrixInverse * boneMatX ) * skinVertex * skinWeight.x;
    skinned += ( bindMatrixInverse * boneMatY ) * skinVertex * skinWeight.y;
    skinned += ( bindMatrixInverse * boneMatZ ) * skinVertex * skinWeight.z;
    skinned += ( bindMatrixInverse * boneMatW ) * skinVertex * skinWeight.w;
    transformed = skinned.xyz;
#endif

screen shot 2018-02-14 at 1 16 54

still shaky, but looks way better :) now, what does this mean in the context of this issue?

1st, you can't just copy-paste this hack in the shader. I mean, you can, but it replaces 1 matrix multiplication with 4. what I propose instead is either a) merge bone matrix with inverse of bind matrix on js side and pass as a single uniform, or b) create another uniform just for this piece of shader. It would then both solve this issue (somewhat) and decrease the number of matrix multiplications in the shader (by 1).