vye16 / slahmr

MIT License
475 stars 50 forks source link

ValueError: Expected value argument (Tensor of shape (1, 138)) #33

Closed carlosedubarreto closed 1 year ago

carlosedubarreto commented 1 year ago

I ran several times without problem, but sometimes it gives this sort of error in the middle of the processing

ValueError: Expected value argument (Tensor of shape (1, 138)) to be within the support (IndependentConstraint(Real(), 1)) of the distribution MixtureSameFamily( Categorical(probs: torch.Size([12]), logits: torch.Size([12])), MultivariateNormal(loc: torch.Size([12, 138]), covariance_matrix: torch.Size([12, 138, 138]))), but found invalid values: tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], device='cuda:0', grad_fn=)

here is a print image

carlosedubarreto commented 1 year ago

From what I searche about this problem, it could be a problem with the learning rate. Where can I find this information to change? thanks

vye16 commented 1 year ago

This happens when optimization struggles to converge in general. You can change the learning rate in slahmr/confs/optim.yaml. We also had a recent update in the pre-processing pipeline, so double check to make sure that all the preprocessing inputs (cameras, tracking) are in the right place and are being accessed.

On Thu, Aug 3, 2023 at 10:00 AM Carlos Barreto @.***> wrote:

From what I searche about this problem, it could be a problem with the learning rate. Where can I find this information to change? thanks

— Reply to this email directly, view it on GitHub https://github.com/vye16/slahmr/issues/33#issuecomment-1664328705, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLOKW6B4QH6UQOS3ESK3TDXTPKKFANCNFSM6AAAAAA3C4QTPE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

carlosedubarreto commented 1 year ago

@vye16 thanks a lot for the answer. I think it might be the learning rate, because I already executed it more than 7 times and had problems only with 2 videos.

BTW, I'm showing this result in twitter and people are loving it. I'm so glad Georgious suggested this repo. Its amazing!!!!

I'll test changing the learning rate and I'll come back to say about the result.

carlosedubarreto commented 1 year ago

Oh sorry to bother with this, but, can you suggest something that I should change? I went to the optim file, but there are so many things, that I dont know what to change.

or I can randomly choose a new value in any option? (considering if one value wil affect all others)

Here is the file that I was looking:

`optim: options: robust_loss_type: "bisquare" robust_tuning_const: 4.6851 joints2d_sigma: 100.0 lr: 1.0 lbfgs_max_iter: 20 save_every: 20 vis_every: -1 max_chunk_steps: 20 save_meshes: False

root: num_iters: 30

smpl: num_iters: 0

smooth: opt_scale: False num_iters: 60

motion_chunks: chunk_size: 10 init_steps: 20 chunk_steps: 20 opt_cams: True

loss_weights: joints2d: [0.001, 0.001, 0.001] bg2d: [0.0, 0.000, 0.000] cam_R_smooth : [0.0, 0.0, 0.0] cam_t_smooth : [0.0, 0.0, 0.0]

bg2d: [0.0, 0.0001, 0.0001]

  #    cam_R_smooth : [0.0, 1000.0, 1000.0]
  #    cam_t_smooth : [0.0, 1000.0, 1000.0]
joints3d: [0.0, 0.0, 0.0]
joints3d_smooth: [1.0, 10.0, 0.0]
joints3d_rollout: [0.0, 0.0, 0.0]
verts3d: [0.0, 0.0, 0.0]
points3d: [0.0, 0.0, 0.0]
pose_prior: [0.04, 0.04, 0.04]
shape_prior: [0.05, 0.05, 0.05]
motion_prior: [0.0, 0.0, 0.075]
init_motion_prior: [0.0, 0.0, 0.075]
joint_consistency: [0.0, 0.0, 100.0]
bone_length: [0.0, 0.0, 2000.0]
contact_vel: [0.0, 0.0, 100.0]
contact_height: [0.0, 0.0, 10.0]
floor_reg: [0.0, 0.0, 0.0]

floor_reg: [0.0, 0.0, 0.167]

`

vye16 commented 1 year ago

Hi, yes, I'd suggest changing the motion_chunks chunk_size (controls how many frames to successively optimize), init_steps (number of optimization steps to perform on the first chunk), and/or chunk_steps (number of optimization steps to perform per chunk). Reducing the chunk size and/or increasing the number of steps per chunk will make optimization slower, but will guide optimization toward better part of the state space before adding more frames, so I'd suggest trying that. If it still diverges, could you attach the video you're trying to process?

On Thu, Aug 3, 2023 at 2:53 PM Carlos Barreto @.***> wrote:

Oh sorry to bother with this, but, can you suggest something that I should change? I went to the optim file, but there are so many things, that I dont know what to change.

or I can randomly choose a new value in any option? (considering if one value wil affect all others)

Here is the file that I was looking:

`optim: options: robust_loss_type: "bisquare" robust_tuning_const: 4.6851 joints2d_sigma: 100.0 lr: 1.0 lbfgs_max_iter: 20 save_every: 20 vis_every: -1 max_chunk_steps: 20 save_meshes: False

root: num_iters: 30

smpl: num_iters: 0

smooth: opt_scale: False num_iters: 60

motion_chunks: chunk_size: 10 init_steps: 20 chunk_steps: 20 opt_cams: True

loss_weights: joints2d: [0.001, 0.001, 0.001] bg2d: [0.0, 0.000, 0.000] cam_R_smooth : [0.0, 0.0, 0.0] cam_t_smooth : [0.0, 0.0, 0.0]

bg2d: [0.0, 0.0001, 0.0001]

cam_R_smooth : [0.0, 1000.0, 1000.0]

cam_t_smooth : [0.0, 1000.0, 1000.0]

joints3d: [0.0, 0.0, 0.0] joints3d_smooth: [1.0, 10.0, 0.0] joints3d_rollout: [0.0, 0.0, 0.0] verts3d: [0.0, 0.0, 0.0] points3d: [0.0, 0.0, 0.0] pose_prior: [0.04, 0.04, 0.04] shape_prior: [0.05, 0.05, 0.05] motion_prior: [0.0, 0.0, 0.075] init_motion_prior: [0.0, 0.0, 0.075] joint_consistency: [0.0, 0.0, 100.0] bone_length: [0.0, 0.0, 2000.0] contact_vel: [0.0, 0.0, 100.0] contact_height: [0.0, 0.0, 10.0] floor_reg: [0.0, 0.0, 0.0] floor_reg: [0.0, 0.0, 0.167]

`

— Reply to this email directly, view it on GitHub https://github.com/vye16/slahmr/issues/33#issuecomment-1664691429, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLOKW4M5TJCYPZRTXWPY63XTQMWXANCNFSM6AAAAAA3C4QTPE . You are receiving this because you were mentioned.Message ID: @.***>

carlosedubarreto commented 1 year ago

Wow, great I knew, it would be much simpler than I was thinking :)

Yep, I can show, here is the video

https://github.com/vye16/slahmr/assets/4061130/39f89c14-f35a-4823-a94a-04e4fdf65877

the error shows up in iterationg 76, I think. I'll make some changes and try again.

carlosedubarreto commented 1 year ago

@vye16 ,out of curiosity, is it possible to change the learning rate while in process?

I was thinking that it could be checked to see if it will get an error, and if will, it could automatically reduce the learning rate, so it wont loose all the progress.

Is that possible in machine learning in general?

I was thinking on trying to implement that, but if it is an absurd idea there is no reason for me to start trying it.

And i have almost no experience with ML (Coding it)

carlosedubarreto commented 1 year ago

I reduced the chunk_size from 10 to 7 and it worked thanks a lot!!!! image

carlosedubarreto commented 1 year ago

just out of curiosity, I dont know if it was coincidence, but this result was on of the worst I had from SLAHMR (the result from the video I sent)