mikael-alafriz-deel / lucid-sonic-dreams

MIT License
771 stars 157 forks source link

Switch to PyTorch #18

Open NotNANtoN opened 3 years ago

NotNANtoN commented 3 years ago

There are many issues raised here because of the use of tensorflow and I also struggle to get it to work. Hence I switched from using the TF StyleGAN repo to using https://github.com/NVlabs/stylegan2-ada-pytorch. This should make the usage of this repo much easier. It works quite nicely in my tests and it might be faster (the NVIDIA people claim the pytorch version is faster, I did not benchmark it).

The only issue I see is that the wikiart model is not compatible with the PyTorch repo - at least it throws an error when trying to convert it from TF to torch. I "solved" this by just defaulting back to the TF repo when using wikiart. This is not beautiful, but it should work. Unfortunately, I cannot test this as I am too stupid to set up TF properly with my GPU. So, if anyone can test if wikiart still works in this PR with TF or knows how to convert it to torch that would be great.

julienbeisel commented 3 years ago

Hi @NotNANtoN, thanks for this work ! I wanted to try to implement it myself but I saw you did it first :)

Which script did you use to convert the pre-trained TF models to PyTorch ? This one looks promising but that's maybe the one you used: https://github.com/rosinality/stylegan2-pytorch#convert-weight-from-official-checkpoints

NotNANtoN commented 3 years ago

Hi @julienbeisel! I used https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/legacy.py. We could try out to use the rosinality weight conversion for the wikiart weights (or all weights where the legacy.py loading does not work).

julienbeisel commented 3 years ago

Hi @julienbeisel! I used https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/legacy.py. We could try out to use the rosinality weight conversion for the wikiart weights (or all weights where the legacy.py loading does not work).

Ok thanks! I will try to work on it :)

I forked the repo to try to make it work on my laptop. I will make some comments if I think some parts can be improved, I can also make a branch and do a PR later if you want!

NotNANtoN commented 3 years ago

Thanks for you feedback! Please do a PR on this PR ;)

It would be best to just convert all TF models to Torch, but I'm not sure if this is easily doable for conditional GANs.

julienbeisel commented 3 years ago

Alright I will submit a PR once it's done and I'll try to convert the models. I will also try to fix the comments on the PR.

julienbeisel commented 3 years ago

@NotNANtoN I spent some time working on it and I really couldn't figure out how to convert these models for PyTorch (nothing is working). I guess the only way to make it work would be to re-train them but it's a lot of work...

NotNANtoN commented 3 years ago

@julienbeisel I assume it's simply not possible with some models unless we know their exact architecture. Maybe you can just make a PR with your changes and then we'll just use PyTorch wherever possible and for some models TF?

At least this is working and some pytorch is better than none imo. Have you tried the batch_size argument? When I tried it it just slowed things down, which I don't really understand. But I'll look into it

NotNANtoN commented 3 years ago

Major update: increased speed massively by a factor of 5-7. The generation for a 90 seconds long piece of music now takes 4.30 minutes for 60fps, whereas this was at over 20 minutes for 43 fps before.

MoemaMike commented 3 years ago

the objective of this branch is to allow stylegan TF models work with pytorch? I already have model generated with stylegan2-ada-pytorch. Is there a readymade solution for that use case?

NotNANtoN commented 3 years ago

@MoemaMike The objective is to completely switch to PyTorch. All pytorch models trained using the NVIDIA-pytorch repositories work with this branch.

There are some TF models that cannot be converted to pytorch atm, in these cases the library just loads them as tensorflow.

Maybe @mikaelalafriz can review and/or merge this branch soonish.

MoemaMike commented 3 years ago

ok, thanks, looking forward to trying it out on Colab

Breeze-Zero commented 3 years ago

unfortunately,my progress sometimes have been killed how to solve it well

Breeze-Zero commented 3 years ago

and Setting up PyTorch plugin "upfirdn2d_plugin"... Failed warning

NotNANtoN commented 3 years ago

Your progress has been killed because your RAM is full. Either get more RAM or some immediate saves need to be made in the code.

As for the plugin: either get compatible cuda drivers with your pytorch version of nvcc "nvidia-cuda-toolkit" or you can add a "return False" in approxiamtely line 27 in the file upfird2nd.py in stylegan2/torch_utils/ops to disable the initialization of the pluging. Not really beautiful, I know.

timelf123 commented 3 years ago

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data
MaxJohnsen commented 3 years ago

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data

This happens when the batch_size are set to 1 (error in line 600, main.py). Try increasing the batch size.

chloebubble commented 3 years ago

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data

This happens when the batch_size are set to 1 (error in line 600, main.py). Try increasing the batch size.

I also ran into the same issue, I can confirm increasing the batch size resolves it.

etrh commented 2 years ago

I've tried everything that possibly could and for the past five days I haven't been able to get this to work at all (both the original lucidsonicdreams and @NotNANtoN 's clmr_clip branch) on A100, V100, and RTX 3090 GPUs. Does anyone here have some suggestions to get this to work? I'm seriously out of ideas to try and I'm really confused as to why it's so hard to set this up on Ampere architecture. I've tinkered a lot with the dnnlib/tflib/custom_ops.py as well. Have reinstalled CUDA, etc. Nothing helps!

It works well on Google Colab and on 1070 Ti, but seems impossible to install on RTX 3090, V100, A100.

I've followed this link step-by-step: https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

I've also tried nvcr.io/nvidia/tensorflow:20.06-tf1-py3 and nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 Docker images and they also haven't been helpful.

MaxJohnsen commented 2 years ago

I've tried everything possible and for the past five days I haven't been able to get this (both the original lucidsonicdreams and @NotNANtoN 's clmr_clip branch) to work on A100 and V100 GPUs. Does anyone here have some suggestions to finally get this to work? I'm seriously out of ideas to try and I'm really confused as to why it's so hard to get to work on Ampere architecture. It works well on Google Colab and on 1070 Ti, but impossible to install on RTX 3090, V100, A100.

I've followed this link step-by-step: https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

I've also tried nvcr.io/nvidia/tensorflow:20.06-tf1-py3 and nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 Docker images and they also haven't been helpful.

Frustrating. I currently have a working setup on my RTX 3090 running on Ubuntu 20.04.02. Here is some info about my environment, maybe it can point you in the right direction.

Python 3.7

PyTorch: torch 1.8.1+cu111 torchaudio 0.8.1 torchvision 0.9.1+cu111

Nvidia: NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2

Michipulatos commented 2 years ago

Long shot but figured Id sare - having trouble using LSD with a custom class-conditioned pkl trained via the pytorch variation of stylegan2.

Hallucinating...

Generating frames:   0%|                                                                       | 0/7296 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    L.hallucinate(file_name = 'song.mp4')
  File "/home/bethos/tryagain/lucid-sonic-dreams/lucidsonicdreams/main.py", line 754, in hallucinate
    self.generate_frames()
  File "/home/bethos/tryagain/lucid-sonic-dreams/lucidsonicdreams/main.py", line 596, in generate_frames
    w_batch = self.Gs.mapping(noise_batch, class_batch.to(device), truncation_psi=self.truncation_psi)
  File "/home/bethos/anaconda3/envs/sonicstylegan-classes/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "<string>", line 222, in forward
  File "stylegan2/torch_utils/misc.py", line 93, in assert_shape
    raise AssertionError(f'Wrong size for dimension {idx}: got {size}, expected {ref_size}')
AssertionError: Wrong size for dimension 1: got 0, expected 20
fractaldna22 commented 2 years ago

Isn't wikiart fundamentally a VQGAN model? That's trained with pytorch out of the box. config: http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.yaml checkpoint: http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.ckpt

Chauban commented 2 years ago

How to use it please? I go into NotNANtoN 's repo and see the introdunction is yet pip install lucidsonicdreams, but it cannot run with pytorch model yet. What should I do? My test environment is Colab. Thanks so much.