rosinality / stylegan2-pytorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch
MIT License
2.73k stars 620 forks source link

error in converting .pkl to .pt #250

Open zjgt opened 3 years ago

zjgt commented 3 years ago

I used google colab to convert from a model trained from a custom dataset with stylegan2-ADA-PyTorch with all default settings and got this error message: Traceback (most recent call last): File "/content/stylegan2-pytorch/convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils'

After I pip install torch_utils, I got the following message: Traceback (most recent call last): File "/content/stylegan2-pytorch/closed_form_factorization.py", line 18, in ckpt = torch.load(args.ckpt) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence'

What should I do?

rosinality commented 3 years ago

convert_weight.py does not supports stylegan2-ada-pytorch.

zjgt commented 3 years ago

I see. Thanks!

On Thu, Aug 19, 2021 at 8:49 PM Kim Seonghyeon @.***> wrote:

convert_weight.py does not supports stylegan2-ada-pytorch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rosinality/stylegan2-pytorch/issues/250#issuecomment-902348398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQSBAG6XBAWB6PYNIEWR37TT5WRATANCNFSM5CO7HA5A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

AlexTitovWork commented 2 years ago

Hello ! I used $ python convert_weight.py --repo ~/stylegan2 stylegan2-ffhq-config-f.pkl
I downloaded .pkl file from here https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/ . The conversion on my machine is in stall, without any signal, why can i get this issue .Have you any ideas? How can i get clear .pt format in ModeuleScripting torch format to use in CPP. I need format like in torch.jit.save("model.pt")

gzhhhere commented 2 years ago

I used google colab to convert from a model trained from a custom dataset with stylegan2-ADA-PyTorch with all default settings and got this error message: Traceback (most recent call last): File "/content/stylegan2-pytorch/convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils'

After I pip install torch_utils, I got the following message: Traceback (most recent call last): File "/content/stylegan2-pytorch/closed_form_factorization.py", line 18, in ckpt = torch.load(args.ckpt) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence'

What should I do?

i got the same question,have you solved it?

AlexTitovWork commented 2 years ago

No, yet no.. there are problem with converting GAN from .pkl to python torch file format

rosinality commented 2 years ago

Compling custom operations could take a long time, and it is required both for official stylegan2 codes and in this repositories. I think you can try to spot the codes which causes the stall.

Regarding pytorch jit, I haven't tried to do it. But I think it is possible if you trace the model without custom operators.

AlexTitovWork commented 2 years ago

I use docker from NVlab, and add project to internal struccture but in original NVlabs docker there are no TF, Cublast, and other. Have you any docker configuration or dependence list for this project correct start?

AlexTitovWork commented 2 years ago

Hello, my stack trace, what is you python interpreter version ? i use 3.6. with CUDA 10.2. Why on my machine this project try to find CUDA 10.0 ? I use 10.2. version of CUDA and it corrected installation.

stylegan2-pytorch$ python convert_weight.py --repo ../stylegan2 ./stylegan2-ffhq-config-f.pkl Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/usr/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "convert_weight.py", line 14, in from dnnlib import tflib File "/media/interceptor/SSDStorage/Git_Medium_repo/Binary_search_engine_CUDA/converterGAN/stylegan2-pytorch/dnnlib/tflib/init.py", line 7, in from . import autosummary File "/media/interceptor/SSDStorage/Git_Medium_repo/Binary_search_engine_CUDA/converterGAN/stylegan2-pytorch/dnnlib/tflib/autosummary.py", line 26, in import tensorflow as tf File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 24, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 49, in from tensorflow.python import pywrap_tensorflow File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/usr/lib/python3.6/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help.

deviceQuery$ ./deviceQuery ./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 850M" CUDA Driver Version / Runtime Version 11.5 / 10.2 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 4046 MBytes (4242604032 bytes) ( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores GPU Max Clock rate: 902 MHz (0.90 GHz) Memory Clock rate: 1001 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.5, CUDA Runtime Version = 10.2, NumDevs = 1

MY PIP LIST stylegan2-pytorch$ pip list Package Version


systemd-python 234
tensorboard 1.11.0
tensorflow 1.11.0
tensorflow-estimator 1.13.0
tensorflow-gpu 1.13.1

termcolor 1.1.0
terminado 0.8.1
tesserocr 2.4.0
testpath 0.4.2
tifffile 2020.9.3
toolz 0.10.0
torch 1.8.0
torchaudio 0.8.0
torchvision 0.9.0
tornado 5.1.1
traitlets 4.3.2
typing-extensions 3.7.4.3

AlexTitovWork commented 2 years ago

May be problem in here ? I use CUDA 10.2 but TF 1.13. -1.15. not support it, only CUDA 10.0? https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible

rosinality commented 2 years ago

@AlexTitovWork Yes, tf 1.X binaries only supports 10. It would be easier to use cuda 10.0 and use the pytorch releases that supports it. https://github.com/rosinality/alias-free-gan-pytorch/blob/main/Dockerfile

AlexTitovWork commented 2 years ago

Thanks @rosinality for the Docker config. I builded docker images successful. It works well on NVIDIA-docker. But there are questions.

  1. In docker images I used cuda 10.0, I installed tensorflow-gpu==1.4.0, and also docker used torch1.7.1+cu92. But in the project you have tested on:     PyTorch 1.3.1     CUDA 10.1/10.2
  2. Can this fact affect on execution of the .pkl converter?  configuration  3. After running the script, nothing happens ... What could be the reason? Maybe I am using the old architecture with CC 5.0?

device_configuration_2 I Used CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVIDIA GeForce GTX 850M"

  CUDA Capability Major/Minor version number:    5.0   Total amount of global memory:                 4046 MBytes (4242604032 bytes)   ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores   GPU Max Clock rate:                            902 MHz (0.90 GHz)   Memory Clock rate:                             1001 Mhz   Memory Bus Width:                              128-bit   L2 Cache Size:                                 2097152 bytes   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)

rosinality commented 2 years ago

@AlexTitovWork How long does it stalled? It could take some time. As it does not show error message, it is hard to guess what could be the problem. You can use Ctrl+C to show stack traces.

AlexTitovWork commented 2 years ago

Hello @rosinality ! It in stall very long, more than 8 hour. Thanks for support. i geted stack trace by Ctrl+C. root@1da8e0e33d03:/home/git_repos/stylegan2-pytorch# python convert_weight.py --repo ../stylegan2 stylegan2-ffhq-config-f.pkl ^CTraceback (most recent call last): File "convert_weight.py", line 11, in from model import Generator, Discriminator File "/home/git_repos/stylegan2-pytorch/model.py", line 11, in from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d, conv2d_gradfix File "/home/git_repos/stylegan2-pytorch/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/home/git_repos/stylegan2-pytorch/op/fused_act.py", line 15, in os.path.join(module_path, "fused_bias_act_kernel.cu"), File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 997, in load keep_intermediates=keep_intermediates) File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1206, in _jit_compile baton.wait() File "/usr/local/lib/python3.7/dist-packages/torch/utils/file_baton.py", line 48, in wait time.sleep(self.wait_seconds) KeyboardInterrupt root@1da8e0e33d03:/home/git_repos/stylegan2-pytorch#

AlexTitovWork commented 2 years ago

I got the message like in this post: https://github.com/zhou13/neurvps/issues/1

AlexTitovWork commented 2 years ago

It works: go to your .cache directory, delete the lock file for your cpp extension (it is likely under the directory ~/.cache/torch_extensions/something), and you should be able to run it again. If you can't find your cache directory, you can run python -m pdb your_program.py and break at your .../lib/python3.X/site-packages/torch/utils/cpp_extension.py line 1179 (specifically the line containing "baton = FileBaton(os.path.join(build_directory, 'lock'))") and then print "build_directory". That should be the cache directory for your programs lock_file .

AlexTitovWork commented 2 years ago

@rosinality i started model converter, but i have not enought memory, How much memory we need for converting stylegan2-ffhq-config-f.pkl for example? I have 4Gb Video RAM, but the project crashed when memory filled. @rosinality
We can close issue !!!)

rosinality commented 2 years ago

@AlexTitovWork Converting itself will not require gpu memory. GPU memory consumption is came from sample generation. Converted .pt file should be generated anyway.

AlexTitovWork commented 2 years ago

GPU usage 100% and more from nvidia-smi request. memory_end memory_end_error I will try RTX 2090...

rosinality commented 2 years ago

@AlexTitovWork I think you can use CUDA_VISIBLE_DEVICES=-1 to prevent tensorflow to use gpu.

sscalvo commented 2 years ago

I used google colab to convert from a model trained from a custom dataset with stylegan2-ADA-PyTorch with all default settings and got this error message: Traceback (most recent call last): File "/content/stylegan2-pytorch/convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils'

After I pip install torch_utils, I got the following message: Traceback (most recent call last): File "/content/stylegan2-pytorch/closed_form_factorization.py", line 18, in ckpt = torch.load(args.ckpt) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence'

What should I do?

Hi Sam Did you ever managed to do the model conversion? Looking forward to hear good news from you Thanks

zjgt commented 2 years ago

I used google colab to convert from a model trained from a custom dataset with stylegan2-ADA-PyTorch with all default settings and got this error message: Traceback (most recent call last): File "/content/stylegan2-pytorch/convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils' After I pip install torch_utils, I got the following message: Traceback (most recent call last): File "/content/stylegan2-pytorch/closed_form_factorization.py", line 18, in ckpt = torch.load(args.ckpt) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence' What should I do?

Hi Sam Did you ever managed to do the model conversion? Looking forward to hear good news from you Thanks

I have been working on other projects lately and have not tried it again after rosinality said it was impossible.

49xxy commented 2 years ago

我使用 google colab 从使用 stylegan2-ADA-PyTorch 和所有默认设置的自定义数据集训练的模型进行转换,并收到此错误消息: Traceback (last recent call last): File "/content/stylegan2-pytorch/convert_weight.py ",第 236 行,在生成器、鉴别器、g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils' 在我 pip install torch_utils 后,我收到以下消息: Traceback(最近一次调用最后一次):文件“/content/stylegan2-pytorch/closed_form_factorization.py”,第 18 行,在 ckpt = torch.load(args.ckpt) 文件中/usr/local/lib/python3.7/dist-packages/torch/serialization.py”,第 608 行,在加载中返回 _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) 文件“/usr/local/lib/python3 .7/dist-packages/torch/serialization.py",第 777 行,在 _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence' 我应该怎么办?

嗨 Sam 你有没有设法进行模型转换? 期待听到你的好消息 谢谢

Hi,has your problem been solved?

AlexTitovWork commented 2 years ago

Hello ! Yes , i solved all issue. If we work with big image on weak GPU you can cut image until 400x400 for example and test you algo. On GTX 3090 with 24 Gb, i can convert any model. Internal structure of model used GPU tensors and it took a GPU-memory in translation process.

NaveenM12 commented 1 year ago

I used google colab to convert from a model trained from a custom dataset with stylegan2-ADA-PyTorch with all default settings and got this error message: Traceback (most recent call last): File "/content/stylegan2-pytorch/convert_weight.py", line 236, in generator, discriminator, g_ema = pickle.load(f) ModuleNotFoundError: No module named 'torch_utils'

After I pip install torch_utils, I got the following message: Traceback (most recent call last): File "/content/stylegan2-pytorch/closed_form_factorization.py", line 18, in ckpt = torch.load(args.ckpt) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, pickle_load_args) ModuleNotFoundError: No module named 'torch_utils.persistence'

What should I do?

Were you able to finally figure out the conversion? Been struggling with this for a while now too, so wanted to see if you found a solution? @zjgt

AggelosMargkas commented 1 year ago

To whom it may interest,

on the documentation of stylegan3 it specifies that the code should on the same path with _torchutils and ddnlib to be accessibly via PYTHONPATH. So basically reassure the location of the model you want to convert is on the same path as the stylegan source code.