Closed qinghuannn closed 1 year ago
@qinghuannn Thanks for your interest in this work! Can you know me configs used for the training and your installed packages' version?
@qinghuannn Thanks for your interest in this work! Can you know me configs used for the training and your installed packages' version?
I train the model via the command python train.py
and do not add other parameters. The default config configs/train.yml
seems to be used. The installed packages's version is shown in the last part. Does the incorrect packages' version cause this problem?
And I get an overflow when I preprocess data via python scripts/prepare_humanml3d.py
. Does this matter?
Package Version Editable project location ----------------------- ------------------ -------------------------------------------------- absl-py 1.3.0 aiohttp 3.8.3 aiosignal 1.3.1 alembic 1.9.0
antlr4-python3-runtime 4.9.3
async-timeout 4.0.2
attrs 22.1.0
autopage 0.5.1
black 22.12.0
body-visualizer 1.1.0
brotlipy 0.7.0
cachetools 5.2.0
certifi 2022.12.7
cffi 1.15.0
cfgv 3.3.1
charset-normalizer 2.1.1
click 8.1.3
cliff 4.1.0
cmaes 0.9.0
cmd2 2.4.2
colorlog 6.7.0
colour 0.1.5
commonmark 0.9.1
configparser 5.3.0
contourpy 1.0.6
cryptography 37.0.2
cycler 0.11.0
decorator 4.4.2
distlib 0.3.6
dotmap 1.3.30
exceptiongroup 1.0.4
fastjsonschema 2.16.2
filelock 3.8.2
flake8 6.0.0
fonttools 4.38.0
freetype-py 2.3.0
frozenlist 1.3.3
fsspec 2022.11.0
ftfy 6.1.1
fvcore 0.1.5.post20221213
google-auth 2.15.0
google-auth-oauthlib 0.4.6
greenlet 2.0.1
grpcio 1.51.1
huggingface-hub 0.11.1
human-body-prior 2.2.2.0 /home/xxx/workspace/tools/human_body_prior/src
hydra-colorlog 1.2.0
hydra-colorlog 1.2.0
hydra-core 1.3.0
hydra-optuna-sweeper 1.2.0
identify 2.5.10
idna 3.4
imageio 2.23.0
imageio-ffmpeg 0.4.7
importlib-metadata 5.2.0
importlib-resources 5.10.1
iniconfig 1.1.1
iopath 0.1.10
isort 5.11.3
jedi 0.18.2
jsonschema 4.17.3
jupyter_core 5.1.0
kiwisolver 1.4.4
lightning-utilities 0.4.2
loguru 0.6.0
Mako 1.2.4
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.2.2
mccabe 0.7.0
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
moviepy 1.0.3
multidict 6.0.3
mypy-extensions 0.4.3
nbformat 5.7.0
nbstripout 0.6.1
networkx 2.8.8
nodeenv 1.7.0
numpy 1.24.0
oauthlib 3.2.2
olefile 0.46
omegaconf 2.3.0
opencv-python 4.5.1.48
optuna 2.10.1
packaging 22.0
pandas 1.5.2
parso 0.8.3
pathspec 0.10.3
pbr 5.11.0
Pillow 9.3.0
pip 22.3.1
pkgutil_resolve_name 1.3.10
platformdirs 2.6.0
pluggy 1.0.0
portalocker 2.6.0
pre-commit 2.20.0
prettytable 3.5.0
proglog 0.1.10
protobuf 3.20.1
psbody-mesh 0.4
pudb 2022.1.3
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.10.0
pycparser 2.21
pyflakes 3.0.1
pyglet 2.0.2.1
Pygments 2.13.0
PyOpenGL 3.1.0
PyOpenGL-accelerate 3.1.5
pyOpenSSL 22.0.0
pyparsing 3.0.9
pyperclip 1.8.2
pyrender 0.1.43
pyrsistent 0.19.2
PySocks 1.7.1
pytest 7.2.0
python-dateutil 2.8.2
python-dotenv 0.21.0
pytorch-lightning 1.8.5.post0
pytorch3d 0.7.2
pytz 2022.7
PyYAML 6.0
pyzmq 24.0.1
regex 2022.10.31
requests 2.28.1
requests-oauthlib 1.3.1
rich 12.6.0
rsa 4.9
scipy 1.9.3
setuptools 65.6.3
sh 1.14.3
six 1.16.0
SQLAlchemy 1.4.45
stevedore 4.1.1
tabulate 0.9.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.5.1
termcolor 2.1.1
tokenizers 0.12.1
toml 0.10.2
tomli 2.0.1
torch 1.12.0
torchaudio 0.12.0
torchmetrics 0.11.0
torchvision 0.13.0
tqdm 4.64.1
traitlets 5.7.1
transformers 4.21.1
transforms3d 0.3.1
trimesh 3.9.5
typing_extensions 4.4.0
urllib3 1.26.13
urwid 2.1.2
urwid-readline 0.13
virtualenv 20.17.1
wcwidth 0.2.5
Werkzeug 2.2.2
wheel 0.37.1
yacs 0.1.8
yarl 1.8.2
zipp 3.11.0
When tracking the overflow in preprocess, I find that the data P04G01R03F0343T0437A0501.npy
from your supported humanact12 causes this error, as shown in following codes.
>>> data = np.load('humanact12_processed.pkl', allow_pickle=True)
>>> tmp = data['P04G01R03F0343T0437A0501.npy']
>>> tmp['joints3D'].min(), tmp['joints3D'].max()
(-0.806142492005127, 4.837814713475272e+276)
>>> tmp['joints3D'].mean(), tmp['joints3D'].std()
(7.07282852847262e+272, inf)
@qinghuannn Thanks for finding this! I think it is good to exclude the samples with the abnormally large values at here. I will soon update the code to make it fail-safe.
@qinghuannn However, there still is a possibility that some long motion clips (especially in HumanAct12) can take a large amount of memory, which causes the runtime overflow error unless you have a machine with abundant memories.
@jihoonerd I'm sure that it's not cauesed by limited memory since I run all codes on a machine with large memories (256GB RAM).
After excluding all abnormal data P07G01R02F0401T0607A0201.npy, P10G01R01F1418T1500A0604.npy and P04G01R03F0343T0437A0501.npy
, the training loss seems normal.
@jihoonerd The code of test pipeline has many bugs. At eval_util.py, some non-existing keys are called, such as meta["gt_translation"]
, meta["clip_score_norm"]
, meta["mm_distance_norm"]
and so on.
Hope the auther check it and update it. Thank you very much!
Hi, Thanks for your nice work! Loss is nan when I set the batchsize as 16 and train the model on 8 RTX 2080Ti via the command
python train.py
. How to solve this problem?