The problem of KL divergence being inf in the late stage of training

Hellod035 commented 2 months ago

Steps to reproduce: Increase max_epochs in skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml and run

python skillmimic/run.py --task HRLScoringLayup --cfg_env skillmimic/data/cfg/skillmimic_hlc.yaml \
--cfg_train skillmimic/data/cfg/train/rlg/hrl_humanoid_discrete_layupscore.yaml \
--motion_file skillmimic/data/motions/BallPlay-M/run \
--llc_checkpoint skillmimic/data/models/mixedskills/nn/skillmimic_llc.pth \
--resume_from skillmimic/data/models/hlc_scoring/nn/SkillMimic.pth \
--headless

then you will see "NaN or Inf found in input tensor" in terminal, it actually because of some of the KL divergence being inf. I would like to ask if this phenomenon has been noticed, whether this is allowed or whether the hyperparameters need further adjustment.

wyhuai commented 2 months ago

Hi, this does occur during the training of the high-level policy, but it currently doesn't seem to affect the results. We plan to address this issue later, so for now, you can consider it acceptable.

Hellod035 commented 2 months ago

Thank you very much for your reply :)

wyhuai / SkillMimic

The problem of KL divergence being inf in the late stage of training #4