Training Time - Githubissues

hk-zh commented 1 year ago

Hi, Could I also ask how long it takes to train your hulc model for each epoch with 8x NVIDIA RTX 2080Ti. Training costs many memory resources(like 200+ GB) under my setup.

lukashermann commented 1 year ago

One epoch took around 1.5h in our setup, so the whole training with 30 epochs was finished in 45 hours (using the shm_dataset). Yes, it is very memory hungry, in our case it used around 280gb of RAM. This could probably be reduced at the expense of training speed.

hk-zh commented 1 year ago

Thank you for your quick answer. Another short question (no need to open another issue topic ), the parameter perceptual_emb_slice in your logistic_decoder_rnn.py is [64,128]. I checked perception embedding and found it is the embedding for the gripper image. Could I ask why you only use gripper images to train your model since global images are also important?

mees commented 1 year ago

In HULC the policy decoder receives continuous gripper camera observations and the we also use relative actions transformed into the gripper camera frame, as we found this to work better. The rgb images of the static camera are used together with the gripper camera to generate the latent plans. As the policy is conditioned on the latent plan, it also implicitly gets the static camera observations. Does this answer your question?

On Tue, Nov 22, 2022, 00:35 Hongkuan Zhou @.***> wrote:

Thank you for your quick answer. Another short question (no need to open another issue topic ), the parameter perceptual_emb_slice in your logistic_decoder_rnn.py is [64,128]. I checked perception embedding and found it is the embedding for the gripper image. Could I ask why you only use gripper images to train your model since global images are also important?

— Reply to this email directly, view it on GitHub https://github.com/lukashermann/hulc/issues/7#issuecomment-1322800679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGH2Z2RD6EUWJMXFQK44K3WJQBNRANCNFSM6AAAAAASFZNOQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hk-zh commented 1 year ago

Thanks. This perfectly answered my question. By the way, Could I ask a question about the actions in your dataset? I used your default setup, namely the relative gripper actions. I would like to know what the scale for each relative action is. I found it's pretty large if the scale is in meters(sometimes over 20cm for each action). Also, Could I ask for the time frequency to sample your dataset?

lukashermann commented 1 year ago

The relative actions are normalized to the interval (-1,1). To convert them to back metric space, the position component (x,y,z) is multiplied by 0.02 and the orientation component (euler angles) by 0.05. This happens in the calvin_env (forget the maxing_scaling_factor, it's always 1). The control frequency we recorded with was 30 hz.

hk-zh commented 1 year ago

Thanks! Found the code in calvin_env.

lukashermann / hulc

Training Time #7