open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.31k stars 1.25k forks source link

[Bug] #2886

Open Bradly-s opened 1 week ago

Bradly-s commented 1 week ago

Branch

main branch (1.x version, such as v1.0.0, or dev-1.x branch)

Prerequisite

Environment

Package Version Location


absl-py 1.1.0 addict 2.4.0 aliyun-python-sdk-core 2.16.0 aliyun-python-sdk-kms 2.16.5 anyio 3.6.1 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 attrs 21.4.0 Babel 2.10.3 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.0 brotlipy 0.7.0 cachetools 5.2.0 certifi 2021.5.30 cffi 1.14.6 chardet 4.0.0 charset-normalizer 3.4.0 click 8.1.7 colorama 0.4.6 conda 4.10.3 conda-package-handling 1.7.3 crcmod 1.7 cryptography 3.4.7 cycler 0.11.0 debugpy 1.6.0 decorator 5.1.1 decord 0.6.0 defusedxml 0.7.1 einops 0.8.0 entrypoints 0.4 executing 0.8.3 fastjsonschema 2.15.3 filelock 3.14.0 fonttools 4.33.3 google-auth 2.8.0 google-auth-oauthlib 0.4.6 grpcio 1.46.3 idna 2.10 importlib-metadata 8.5.0 importlib-resources 5.8.0 ipykernel 6.15.0 ipython 8.4.0 ipython-genutils 0.2.0 ipywidgets 7.7.0 jedi 0.18.1 Jinja2 3.1.2 jmespath 0.10.0 json5 0.9.8 jsonschema 4.6.0 jupyter-client 7.3.4 jupyter-core 4.10.0 jupyter-server 1.17.1 jupyterlab 3.4.3 jupyterlab-language-pack-zh-CN 3.4.post1 jupyterlab-pygments 0.2.2 jupyterlab-server 2.14.0 jupyterlab-widgets 1.1.0 kiwisolver 1.4.3 Markdown 3.3.7 markdown-it-py 3.0.0 MarkupSafe 2.1.1 matplotlib 3.5.2 matplotlib-inline 0.1.3 mdurl 0.1.2 mistune 0.8.4 mmaction2 1.2.0 /root/autodl-tmp/mmaction2 mmcv 2.1.0 mmengine 0.10.5 model-index 0.1.11 nbclassic 0.3.7 nbclient 0.6.4 nbconvert 6.5.0 nbformat 5.4.0 nest-asyncio 1.5.5 notebook 6.4.12 notebook-shim 0.1.0 numpy 1.22.4 oauthlib 3.2.0 opencv-contrib-python 4.10.0.84 opencv-python 4.10.0.84 opendatalab 0.0.10 openmim 0.3.9 openxlab 0.1.2 ordered-set 4.1.0 oss2 2.17.0 packaging 24.2 pandas 2.0.3 pandocfilters 1.5.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.1.1 pip 21.1.3 platformdirs 4.3.6 prometheus-client 0.14.1 prompt-toolkit 3.0.29 protobuf 3.19.4 psutil 5.9.1 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycosat 0.6.3 pycparser 2.20 pycryptodome 3.21.0 pygments 2.18.0 pyOpenSSL 20.0.1 pyparsing 3.0.9 pyrsistent 0.18.1 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.4 PyYAML 6.0.2 pyzmq 23.2.0 requests 2.28.2 requests-oauthlib 1.3.1 rich 13.4.2 rsa 4.8 ruamel-yaml-conda 0.15.100 scipy 1.10.1 Send2Trash 1.8.0 setuptools 60.2.0 six 1.16.0 sniffio 1.2.0 soupsieve 2.3.2.post1 stack-data 0.3.0 supervisor 4.2.4 tabulate 0.9.0 tensorboard 2.9.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 termcolor 2.4.0 terminado 0.15.0 tinycss2 1.1.1 tomli 2.1.0 torch 1.11.0+cu113 torchvision 0.12.0+cu113 tornado 6.1 tqdm 4.65.2 traitlets 5.3.0 typing-extensions 4.2.0 tzdata 2024.2 urllib3 1.26.6 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.3.3 Werkzeug 2.1.2 wheel 0.36.2 widgetsnbextension 3.6.0 yapf 0.40.2 zipp 3.20.2

Describe the bug

/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py:649: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is /root/.cache/torch/kernels. This warning will appear only once per process. (Triggered internally at ../aten/src/ATen/native/cuda/jitutils.cpp:860.) tensor.erfinv() Traceback (most recent call last): File "tools/train.py", line 151, in main() File "tools/train.py", line 147, in main runner.train() File "/root/miniconda3/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1748, in train self._init_model_weights() File "/root/miniconda3/lib/python3.8/site-packages/mmengine/runner/runner.py", line 923, in _init_model_weights model.init_weights() File "/root/autodl-tmp/mmaction2/mmaction/models/recognizers/base.py", line 154, in init_weights super().init_weights() File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/base_module.py", line 136, in init_weights m.init_weights() File "/root/autodl-tmp/mmaction2/mmaction/models/backbones/mvit.py", line 855, in init_weights super().init_weights() File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/base_module.py", line 129, in init_weights initialize(self, other_cfgs) File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 610, in initialize _initialize(module, cp_cfg) File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 518, in _initialize func(module) File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 330, in call module.apply(init) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 667, in apply module.apply(fn) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 667, in apply module.apply(fn) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 668, in apply fn(self) File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 327, in init trunc_normal_init(m, self.mean, self.std, self.a, self.b, File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 79, in trunc_normal_init truncnormal(module.weight, mean, std, a, b) # type: ignore File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 682, in truncnormal return _no_grad_truncnormal(tensor, mean, std, a, b) File "/root/miniconda3/lib/python3.8/site-packages/mmengine/model/weight_init.py", line 649, in _no_grad_truncnormal tensor.erfinv_() RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

Reproduces the problem - code sample

No response

Reproduces the problem - command or script

No response

Reproduces the problem - error message

No response

Additional information

No response