ygtxr1997 / ReliableSwap

Official Implementation of 'ReliableSwap: Boosting General Face Swapping Via Reliable Supervision'
198 stars 16 forks source link

compare & 512 #1

Closed Inferencer closed 1 year ago

Inferencer commented 1 year ago

how does this compare with the recently developed roop & if you are able to train to 512 as stated in your todo let me know the requirements such as the time it would take, the dataset you would choose, and I could look at sourcing funding for it to expidite the proccess.

ygtxr1997 commented 1 year ago
  1. As for roop, our baselines are FaceShifter and SimSwap. We haven't compared with the roop. But we will consider compare with roop in the future.
  2. For 512 resolution, we are training a 512x512 model based on FaceShifter baseline, whose training dataset is super resolutioned VGGFace2, taking about 7~10 days to converge. But the original FaceShifter model arichitecture is designed for 256x256 resolution, we are trying to finetune the FaceShifter model to support 512x512 resolution, where we may need to adjust the architecutre of vanilla FaceShifter model, or maybe the vanilla FaceShifter model is ok for 512x512 resolution.
Inferencer commented 1 year ago
  1. As for roop, our baselines are FaceShifter and SimSwap. We haven't compared with the roop. But we will consider compare with roop in the future.

    1. For 512 resolution, we are training a 512x512 model based on FaceShifter baseline, whose training dataset is super resolutioned VGGFace2, taking about 7~10 days to converge. But the original FaceShifter model arichitecture is designed for 256x256 resolution, we are trying to finetune the FaceShifter model to support 512x512 resolution, where we may need to adjust the architecutre of vanilla FaceShifter model, or maybe the vanilla FaceShifter model is ok for 512x512 resolution.

Ok, roop is just based off of insightfaces now closed inswapper and has many active talented developers. The only issue being we are unable to train to 512 due to lack of training details from the original source, I think if you can give a timeline of your code release I can convince those developers to help you finetune as we are currently retaining similar fidelity on video but fixing memory leaks & adding multi-threading upscaling etc will only get us so far if we are stuck with a 128x128 model, the code doesent need to be perfect just put a work in progress notice & write a more detailed to-do-list and i'll see what we & the rest of the open source community can do to help.

ygtxr1997 commented 1 year ago

Thanks for your suggestion! Our method takes two Face Swapping methods as the baselines: FaceShifter (256x256), SimSwap (256x256). Myabe I need some more time to use and compare roop with our re-implemented methods (FaceShifter, SimSwap) and our proposed method (ReliableSwap). As for 512x512 model, we now find that our adjusted FaceShifter model performs not as good as we expected (trained on 8 A100 40G with batch_size 6, steps 400k). It may take more time to converge. We prepare to first release our 256x256 model codes (training and testing), I'm not sure if this has any help for you. After that, we will focus on getting a good 512x512 model.

FlowDownTheRiver commented 1 year ago

Thanks for your suggestion! Our method takes two Face Swapping methods as the baselines: FaceShifter (256x256), SimSwap (256x256). Myabe I need some more time to use and compare roop with our re-implemented methods (FaceShifter, SimSwap) and our proposed method (ReliableSwap). As for 512x512 model, we now find that our adjusted FaceShifter model performs not as good as we expected (trained on 8 A100 40G with batch_size 6, steps 400k). It may take more time to converge. We prepare to first release our 256x256 model codes (training and testing), I'm not sure if this has any help for you. After that, we will focus on getting a good 512x512 model.

Haven't tested your code yet,it is always great to see open minded developers.Thanks for the research.