rinongal / StyleGAN-nada

http://stylegan-nada.github.io/
MIT License
1.16k stars 146 forks source link

Black images on prediction #35

Closed MAGLeb closed 2 years ago

MAGLeb commented 2 years ago

image Net return for me only Nan. image

Maybe it is connected with converting sg2 model? Also, it is really hard for me to set up the model locally. I had tried a lot of different ways.

rinongal commented 2 years ago

A few quick questions:

Are you using a custom StyleGAN2 model? Was it trained with StyleGAN2 or StyleGAN-ADA? And if it was trained with ADA, did you use the tensorflow or the pytorch version?

The conversion script also produces some images from both the original and the converted models, so you should be able to look at those images in the pretrained model directory and see if the conversion happened correctly.

About setting up locally, I'm going to need a few more details :) What OS are you trying to run on? And what sorts of issues did you encounter? We also have a docker version in the readme that you can use to avoid having to set up an environment (though you'll have to set up nvidia-docker instead).

MAGLeb commented 2 years ago

1. Are you using a custom StyleGAN2 model? I loaded models from github:

!git clone https://github.com/NVlabs/stylegan2-ada/ $stylegan_ada_dir
!git clone https://github.com/rinongal/stylegan-nada.git $stylegan_nada_dir

and also load model ffhq.pt with this part of code:

if not os.path.isfile(os.path.join(pretrained_model_dir, file_name)):
    print("Downloading chosen model...")

    if download_string.endswith(".pkl"):
        !wget $download_string -O $pretrained_model_dir/$file_name
    else:
        downloader.download_file(download_string, os.path.join(pretrained_model_dir, file_name))

In addition, I did every step in the notebook.

2. Environment I use docker environment: nvcr.io/nvidia/pytorch:21.10-py3

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000001:00:00.0 Off |                    0 |
| N/A   49C    P0    57W / 149W |      0MiB / 11441MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

3. We also have a docker version in the readme that you can use to avoid having to set up an environment (though you'll have to set up nvidia-docker instead). Thank you. I will try use docker later. If you could help me with the notebook, I will appreciate it.

MAGLeb commented 2 years ago

"The conversion script also produces some images from both the original and the converted models, so you should be able to look at those images in the pretrained model directory and see if the conversion happened correctly."

Conversion script? Do you mean this part of the code?

if not os.path.isfile(os.path.join(pretrained_model_dir, pt_file_name)):
    print("Converting sg2 model. This may take a few minutes...")

    tf_path = next(filter(lambda x: "tensorflow" in x, sys.path), None)
    py_path = tf_path + f":{stylegan_nada_dir}/ZSSGAN"
    convert_script = os.path.join(stylegan_nada_dir, "convert_weight.py")
    !PYTHONPATH=$py_path python $convert_script --repo $stylegan_ada_dir --gen $pretrained_model_dir/$file_name

This part of the code did not execute because of the 'if' condition.

rinongal commented 2 years ago

Conversion script: I was asking because you asked:

Maybe it is connected with converting sg2 model?

in the original post. If you're using FFHQ then there's indeed no need to convert anything and that part of the code shouldn't execute.

About the notebook:

Could you try generating images without fine-tuning the model? You should be able to do that by setting the number of training iterations to 0 and then running both the training block and the generation block (steps 3 and 4).

Did you change anything about the code itself in section 3?

MAGLeb commented 2 years ago

1. Could you try generating images without fine-tuning the model? You should be able to do that by setting the number of training iterations to 0 and then running both the training block and the generation block (steps 3 and 4).

Set up the number of training iterations to 0 and run both steps. image

2. Did you change anything about the code itself in section 3?

Nothing. As I understand, 'ZSSGAN' can not load the 'ffhq.pt' model, tight? Therefore can not load samples.

  1. Also, I opened the docker hub, and UI for training and predicting. But want to deal with a notebook.
rinongal commented 2 years ago

Alright. So it looks like you're having issues already at the generation step and before we even do any training. I'm not sure what the problem could be, but I suspect it might just be some pytorch / cuda version issue.

I tried looking up similar issues in Rosinality's repo. There are two that might be related: https://github.com/rosinality/stylegan2-pytorch/issues/214 https://github.com/rosinality/stylegan2-pytorch/issues/60 However, if you didn't modify my code then you're using 'cuda' as the device and not 'cuda:0', and this issue should not arise.

Maybe something else that seems strange in your screenshots: your warning output says you're using PyTorch 1.10.2+cu102, but your nvidia-smi screenshot lists cuda 11.4 so there might be some mismatch there.

Did the UI in our docker work for you? If it does, then the environment there is working fine and you can probably just set up your notebook inside that docker instead?

MAGLeb commented 2 years ago

I appreciate your help, didn't wait to get the full responses.

I was able to start jupyter and execute all the code in your docker container. Thank you very much.