4090显卡，Ubuntu22.04，pythorch2.0 操作全纪录

chenkaiC4 commented 1 year ago

感谢作者提供这么优秀的方案。我有一个4090的机器，cuda版本是12.0。在安装过程执行到步骤： bash docs/prepare_env/install_ext.sh时，因为cuda版本问题，无法继续进行。

RuntimeError:
      The detected CUDA version (12.0) mismatches the version that was used to compile
      PyTorch (11.3). Please make sure to use the same CUDA versions.

cuda11.3 对 4090 支持不好，这个需要如何处理？

【2023-05-04更新】本issue将记录深度使用过程中遇到的一些问题和解决方案，再次感谢 @yerfor 同学，期待新版本的发布。

[x] 安装环境
[x] 运行 DEMO
[ ] 制作个人 512*512 视频
[ ] 熟悉训练流程
[ ] 真实训练流程
[ ] 验证模型效果
[ ] 下一步优化点

yerfor commented 1 year ago

你好，你可以安装pytorch2.0和更新的cuda。

chenkaiC4 commented 1 year ago

你好，你可以安装pytorch2.0和更新的cuda。

正在测试，cuda版本换成 11.8 了，pytorch2.0目前支持的最高版本。pytorch3d 也支持cuda 11.8。我先试试看，如果没问题，可以提供 docs/prepare_env/geneface_rtx4090.yaml

chenkaiC4 commented 1 year ago

4090卡，安装 cuda 11.8，按照pytorch的官方教程，使用 conda 安装 pythorch2.0，已经成功了。

目前在走下面这个步骤，速度比较慢，4090的显存不到6G，要运行30分钟，不知道是否正常，export VIDEO_ID=May：

CUDA_VISIBLE_DEVICES=0 bash data_gen/nerf/process_data.sh $VIDEO_ID

其中中间报错，但是依然继续在执行：

37 of 122 done
38 of 122 done
39 of 122 done
40 of 122 done
41 of 122 done
42 of 122 done
43 of 122 done
44 of 122 done
45 of 122 done
Traceback (most recent call last):
  File "/home/ubuntu/code/GeneFace/data_util/deepspeech_features/extract_ds_features.py", line 129, in <module>
    main()
  File "/home/ubuntu/code/GeneFace/data_util/deepspeech_features/extract_ds_features.py", line 103, in main
    deepspeech_pb_path = get_deepspeech_model_file()
  File "/home/ubuntu/code/GeneFace/data_util/deepspeech_features/deepspeech_store.py", line 54, in get_deepspeech_model_file
    with zipfile.ZipFile(zip_file_path) as zf:
  File "/home/ubuntu/miniconda3/envs/geneface/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/ubuntu/miniconda3/envs/geneface/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
[INFO] ===== extracted deepspeech =====
[INFO] ===== extracted all audio labels =====
46 of 122 done
47 of 122 done
48 of 122 done
49 of 122 done
50 of 122 done
51 of 122 done
52 of 122 done
53 of 122 done
54 of 122 done
55 of 122 done
56 of 122 done
57 of 122 done
58 of 122 done
59 of 122 done
60 of 122 done
61 of 122 done
62 of 122 done
63 of 122 done

yerfor commented 1 year ago

你好，看起来是deepspeech的模型文件没有被正常下载。脚本随后继续执行下一步了。

chenkaiC4 commented 1 year ago

你好，看起来是deepspeech的模型文件没有被正常下载。脚本随后继续执行下一步了。

最终报错了，看起来还是 deepspeech没有下载好。

loading deepspeech ...
Traceback (most recent call last):
  File "/home/ubuntu/code/GeneFace/data_gen/nerf/binarizer.py", line 277, in <module>
    binarizer.parse(hparams['video_id'])
  File "/home/ubuntu/code/GeneFace/data_gen/nerf/binarizer.py", line 267, in parse
    ret = load_processed_data(processed_dir)
  File "/home/ubuntu/code/GeneFace/data_gen/nerf/binarizer.py", line 86, in load_processed_data
    deepspeech_features = np.load(deepspeech_npy_name)
  File "/home/ubuntu/miniconda3/envs/geneface/lib/python3.9/site-packages/numpy/lib/npyio.py", line 390, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/processed/videos/May/aud_deepspeech.npy'

chenkaiC4 commented 1 year ago

ubuntu 22.04
4090显卡，驱动: 525.89.02
CUDA 11.8

conda 环境:

name: geneface
channels:
  - pytorch3d
  - pytorch
  - iopath
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - brotlipy=0.7.0=py39h27cfd23_1003
  - bzip2=1.0.8=h7b6447c_0
  - ca-certificates=2023.01.10=h06a4308_0
  - certifi=2022.12.7=py39h06a4308_0
  - cffi=1.15.1=py39h5eee18b_3
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - cryptography=39.0.1=py39h9ce1e76_0
  - cuda-cudart=11.8.89=0
  - cuda-cupti=11.8.87=0
  - cuda-libraries=11.8.0=0
  - cuda-nvrtc=11.8.89=0
  - cuda-nvtx=11.8.86=0
  - cuda-runtime=11.8.0=0
  - ffmpeg=4.2.2=h20bf706_0
  - filelock=3.9.0=py39h06a4308_0
  - freetype=2.12.1=h4a9f257_0
  - fvcore=0.1.5.post20221221=pyhd8ed1ab_0
  - giflib=5.2.1=h5eee18b_3
  - gmp=6.2.1=h295c915_3
  - gmpy2=2.1.2=py39heeb90bb_0
  - gnutls=3.6.15=he1e5248_0
  - idna=3.4=py39h06a4308_0
  - intel-openmp=2021.4.0=h06a4308_3561
  - iopath=0.1.9=py39
  - jinja2=3.1.2=py39h06a4308_0
  - jpeg=9e=h5eee18b_1
  - lame=3.100=h7b6447c_0
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.38=h1181459_1
  - lerc=3.0=h295c915_0
  - libcublas=11.11.3.6=0
  - libcufft=10.9.0.58=0
  - libcufile=1.6.1.9=0
  - libcurand=10.3.2.106=0
  - libcusolver=11.4.1.48=0
  - libcusparse=11.7.5.86=0
  - libdeflate=1.17=h5eee18b_0
  - libffi=3.4.2=h6a678d5_6
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libiconv=1.16=h7f8727e_2
  - libidn2=2.3.2=h7f8727e_0
  - libnpp=11.8.0.86=0
  - libnvjpeg=11.9.0.86=0
  - libopus=1.3.1=h7b6447c_0
  - libpng=1.6.39=h5eee18b_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtasn1=4.19.0=h5eee18b_0
  - libtiff=4.5.0=h6a678d5_2
  - libunistring=0.9.10=h27cfd23_0
  - libvpx=1.7.0=h439df22_0
  - libwebp=1.2.4=h11a3e52_1
  - libwebp-base=1.2.4=h5eee18b_1
  - lz4-c=1.9.4=h6a678d5_0
  - markupsafe=2.1.1=py39h7f8727e_0
  - mkl=2021.4.0=h06a4308_640
  - mkl-service=2.4.0=py39h7f8727e_0
  - mkl_fft=1.3.1=py39hd3c417c_0
  - mkl_random=1.2.2=py39h51133e4_0
  - mpc=1.1.0=h10f8cd9_1
  - mpfr=4.0.2=hb69a4c5_1
  - mpmath=1.2.1=py39h06a4308_0
  - ncurses=6.4=h6a678d5_0
  - nettle=3.7.3=hbbd107a_1
  - networkx=2.8.4=py39h06a4308_1
  - openh264=2.1.1=h4ff587b_0
  - openssl=1.1.1t=h7f8727e_0
  - pillow=9.4.0=py39h6a678d5_0
  - pip=23.0.1=py39h06a4308_0
  - portalocker=2.7.0=py39hf3d152e_0
  - pycparser=2.21=pyhd3eb1b0_0
  - pyopenssl=23.0.0=py39h06a4308_0
  - pysocks=1.7.1=py39h06a4308_0
  - python=3.9.16=h7a1cb2a_2
  - python_abi=3.9=2_cp39
  - pytorch=2.0.0=py3.9_cuda11.8_cudnn8.7.0_0
  - pytorch-cuda=11.8=h7e8668a_3
  - pytorch-mutex=1.0=cuda
  - pytorch3d=0.7.3=py39_cu118_pyt200
  - pyyaml=6.0=py39hb9d737c_4
  - readline=8.2=h5eee18b_0
  - requests=2.29.0=py39h06a4308_0
  - setuptools=66.0.0=py39h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - sympy=1.11.1=py39h06a4308_0
  - tabulate=0.9.0=pyhd8ed1ab_1
  - tk=8.6.12=h1ccaba5_0
  - torchaudio=2.0.0=py39_cu118
  - torchtriton=2.0.0=py39
  - torchvision=0.15.0=py39_cu118
  - tqdm=4.65.0=pyhd8ed1ab_1
  - tzdata=2023c=h04d1e81_0
  - urllib3=1.26.15=py39h06a4308_0
  - wheel=0.38.4=py39h06a4308_0
  - x264=1!157.20191217=h7b6447c_0
  - xz=5.2.10=h5eee18b_1
  - yacs=0.1.8=pyhd8ed1ab_0
  - yaml=0.2.5=h7f98852_2
  - zlib=1.2.13=h5eee18b_0
  - zstd=1.5.5=hc292b87_0
  - pip:
      - absl-py==0.15.0
      - astunparse==1.6.3
      - attrs==23.1.0
      - audioread==3.0.0
      - cachetools==5.3.0
      - clang==5.0
      - configargparse==1.5.3
      - contourpy==1.0.7
      - cycler==0.11.0
      - dearpygui==1.9.0
      - decorator==4.4.2
      - dominate==2.7.0
      - face-alignment==1.3.5
      - ffmpeg-python==0.2.0
      - flatbuffers==1.12
      - fonttools==4.39.3
      - freqencoder==0.0.0
      - fsspec==2023.4.0
      - future==0.18.3
      - gast==0.4.0
      - google-auth==2.17.3
      - google-auth-oauthlib==1.0.0
      - google-pasta==0.2.0
      - gridencoder==0.0.0
      - grpcio==1.54.0
      - h5py==3.1.0
      - huggingface-hub==0.14.1
      - imageio==2.28.1
      - imageio-ffmpeg==0.4.8
      - importlib-metadata==6.6.0
      - joblib==1.2.0
      - keras==2.12.0
      - keras-preprocessing==1.1.2
      - kiwisolver==1.4.4
      - kornia==0.5.0
      - librosa==0.9.2
      - llvmlite==0.39.1
      - lpips==0.1.4
      - markdown==3.4.3
      - matplotlib==3.6.3
      - mediapipe==0.8.11
      - moviepy==1.0.3
      - ninja==1.11.1
      - numba==0.56.4
      - numpy==1.23.0
      - oauthlib==3.2.2
      - opencv-contrib-python==4.7.0.72
      - opencv-python==4.7.0.72
      - opt-einsum==3.3.0
      - packaging==23.1
      - pandas==1.4.4
      - platformdirs==3.5.0
      - pooch==1.7.0
      - praat-parselmouth==0.4.3
      - proglog==0.1.10
      - protobuf==3.20.3
      - pyasn1==0.5.0
      - pyasn1-modules==0.3.0
      - pyaudio==0.2.13
      - pymcubes==0.1.4
      - pyparsing==3.0.9
      - python-dateutil==2.8.2
      - python-speech-features==0.6
      - pytz==2023.3
      - pywavelets==1.4.1
      - raymarching-face==0.0.0
      - regex==2023.5.4
      - requests-oauthlib==1.3.1
      - resampy==0.4.2
      - rsa==4.9
      - scikit-image==0.19.3
      - scikit-learn==1.2.2
      - scipy==1.10.1
      - shencoder==0.0.0
      - six==1.15.0
      - soundfile==0.12.1
      - tensorboard==2.12.3
      - tensorboard-data-server==0.7.0
      - tensorboardx==2.6
      - tensorflow==2.6.0
      - tensorflow-estimator==2.12.0
      - termcolor==1.1.0
      - threadpoolctl==3.1.0
      - tifffile==2023.4.12
      - tokenizers==0.13.3
      - transformers==4.28.1
      - trimesh==3.21.5
      - typing-extensions==3.7.4.3
      - werkzeug==2.3.3
      - wrapt==1.12.1
      - zipp==3.15.0
prefix: /home/ubuntu/miniconda3/envs/geneface

chenkaiC4 commented 1 year ago

看日志，deepspeech-0_1_0-b90017e8.pb.zip是下载好了的，但是只有===== extract deepspeech =====，没有出现 ===== extracted deepspeech =====。

[INFO] ===== extracted esperanto =====

[INFO] ===== extract deepspeech =====
61.68it/s]Downloading /home/ubuntu/.tensorflow/models/deepspeech-0_1_0-b90017e8.pb.zip from https://github.com/osmr/deepspeech_features/releases/download/v0.0.1/deepspeech-0_1_0-b90017e8.pb.zip...
100%|| 6073/6073 [01:30<00:00, 67.11it/s] 【下载成功，但是没有出现 ===== extracted deepspeech =====】

[INFO] ===== extracted face landmarks =====
[INFO] ===== perform face tracking =====
[INFO] ===== extract semantics from data/processed/videos/May/ori_imgs to data/processed/videos/May/parsing =====

chenkaiC4 commented 1 year ago

重新执行一次 CUDA_VISIBLE_DEVICES=0 bash data_gen/nerf/process_data.sh $VIDEO_ID ，这次成功了，🍻。

用May跑了 zozo 的 DEMO案例，视频生成了，但是没有声音，这个是预期的效果吗？ @yerfor

【误会，vscode 远程调试时，音频没让输出，输出的视频，是有声音的】

chenkaiC4 commented 1 year ago

目前根据作者的文档提示，已经成功运行起DEMO。其中在执行推理时，遇到：

ModuleNotFoundError: No module named 'utils.commons'

试过在项目根目录下，运行export PYTHONPATH=.，不起效果。最后是加了两个__init__.py文件解决的，git 截图如下：

接下来，目标是训练一个自己的视频，深度使用下 😄。过程中遇到的问题，都会记录到这个issue中，楼主先别关issue~

Awj2021 commented 1 year ago

感谢作者提供这么优秀的方案。我有一个4090的机器，cuda版本是12.0。在安装过程执行到步骤： bash docs/prepare_env/install_ext.sh时，因为cuda版本问题，无法继续进行。
RuntimeError:
      The detected CUDA version (12.0) mismatches the version that was used to compile
      PyTorch (11.3). Please make sure to use the same CUDA versions.
cuda11.3 对 4090 支持不好，这个需要如何处理？

【2023-05-04更新】本issue将记录深度使用过程中遇到的一些问题和解决方案，再次感谢 @yerfor 同学，期待新版本的发布。

[x] 安装环境

[x] 运行 DEMO

[ ] 制作个人 512*512 视频

[ ] 熟悉训练流程

[ ] 真实训练流程

[ ] 验证模型效果

[ ] 下一步优化点

hi，如果4090的CUDA版本无法改变 (大家共用的服务器，没有root)，安装不同的cudatoolkit之后仍然会有这个问题。有什么好的方法吗？

chenkaiC4 commented 1 year ago

@Awj2021 主要问题是将cuda版本，pythoch版本，pythorch3d对应起来，不一定非要用哪个版本的。但是，目前pythorch官方编译好的只到11.8，然后 pythorch3d也一样的。所以只要不高于11.8版本，理论上都是可以的。

xk-huang commented 1 year ago

也可以使用 docker: https://github.com/xk-huang/GeneFace/tree/main/docker

Pythonpa commented 1 year ago

也可以使用 docker: https://github.com/xk-huang/GeneFace/tree/main/docker

Emm......page not found.......

ChengsongLu commented 8 months ago

ubuntu20.04，cuda11.8，torch2.0.0

走作者的demo到最后一步了“bash scripts/infer_lm3d_radnerf.sh”，然后报错：

Traceback (most recent call last):
  File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/grid.py", line 10, in <module>
    import _gridencoder as _backend
ImportError: /home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/_gridencoder.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl18compute_contiguousEv

File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/encoding.py", line 29, in get_encoder
    from modules.radnerfs.encoders.gridencoder import GridEncoder
  File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/__init__.py", line 1, in <module>
    from .grid import GridEncoder
  File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/grid.py", line 12, in <module>
    from .backend import _backend
  File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/backend.py", line 32, in <module>
    _backend = load(name='_grid_encoder',
  File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
ImportError: /home/lcs/.cache/torch_extensions/py39_cu118/_grid_encoder/_grid_encoder.so: cannot open shared object file: No such file or directory

有哪位大佬可以帮看看怎么解决不？

ChengsongLu commented 8 months ago

ubuntu20.04，cuda11.8，torch2.0.0

走作者的demo到最后一步了“bash scripts/infer_lm3d_radnerf.sh”，然后报错：

Traceback (most recent call last):
  File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/grid.py", line 10, in <module>
    import _gridencoder as _backend
ImportError: /home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/_gridencoder.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl18compute_contiguousEv

File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/encoding.py", line 29, in get_encoder
  from modules.radnerfs.encoders.gridencoder import GridEncoder
File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/__init__.py", line 1, in <module>
  from .grid import GridEncoder
File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/grid.py", line 12, in <module>
  from .backend import _backend
File "/home/lcs/OpenDCO/GeneFace/modules/radnerfs/encoders/gridencoder/backend.py", line 32, in <module>
  _backend = load(name='_grid_encoder',
File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
  return _jit_compile(
File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
  return _import_module_from_library(name, build_directory, is_python_module)
File "/home/lcs/anaconda3/envs/geneface/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
  module = importlib.util.module_from_spec(spec)
ImportError: /home/lcs/.cache/torch_extensions/py39_cu118/_grid_encoder/_grid_encoder.so: cannot open shared object file: No such file or directory

有哪位大佬可以帮看看怎么解决不？

解决了https://github.com/yerfor/GeneFace/issues/264#issuecomment-1895280819

yerfor / GeneFace

4090显卡，Ubuntu22.04，pythorch2.0 操作全纪录 #124