Closed instant-high closed 3 years ago
There are multiple things you could do.
Lower the size of your input videos.
Split the chunks into separate files, then loop over them or do it one by one (painful).
Modify videoswap.py using the below as a starting point. https://github.com/neuralchen/SimSwap/blob/fc4b7013547f023223c83097923aa255a8dd05e7/util/videoswap.py#L38
Use a subprocess and use ffmpeg to split the video into chunks, then do a for loop over each video chunk using the python script, then merge them after the fact in a video editor or with ffmpeg. For example:
~pseudo code~
for video_file in video_file_directory: python test_video_swapsingle.py video_file ...
I would go with number 3, with pseudo code being something like.
*this isn't tested, it's just to give you an idea*
current_frame = 0 max_frame = 14
for frame_index in tqdm(range(frame_count)): ret, frame = video.read() if ret: current_frame += 1 if current_frame == max_frame: Do something to empty video memory here current_frame = 0
detect_results = detect_model.get(frame,crop_size)
if detect_results is not None:
.....
Like I said, I haven't tested it, but it could be a bit of work to get it implemented from scratch as I haven't looked into how the models are loaded into memory yet. The above is definitely enough to work out your own solution though without messing with torch though.
Ok. Added Line #50 in util/videoswap.py torch.cuda.empty_cache() This lets me process 99 frames before out of memory.... I'll try to free memory also in the second for loop
Ok. Added Line #50 in util/videoswap.py torch.cuda.empty_cache() This lets me process 99 frames before out of memory.... I'll try to free memory also in the second for loop
Great to hear. I would try to create a little wrapper function where you can tune your own parameters (max frame count), and plug it into line 50 where it executes torch.cuda.empty_cache()
every nth frame.
Yes. But why does it run out of memory after 99 frames even if i call "empty_cache" after each frame? I cannot find anything filling the cache. Searched all other scripts in simswap. Btw.: I don't know much about python... just beginner after 30 years coding in (visual)basic and a little c++
So I need little bit of help. I've inserted the following code:
for frame_index in tqdm(range(frame_count)):
torch.cuda.empty_cache()
ret, frame = video.read()
if frame_index == 98:
print (frame_index)
input("Press Enter to continue...")
break
Then it begins to write the video_file(1) containing the first 98 frames
Is there a way to jump back to video_swap but continue with frame 99 for the next 98 frames? Call video_swap or something like goto video_swap? After the break it just had to write video_file(2) And so on... Don't know if this would work and how to....
EDIT. Got it to work as written above, but when calling swap_video again (break after 10 frames) it runs out of memory immediately.
The way torch.cuda.empty_cache() works is it only frees the memory that it's able to. Remember what I said about me not being aware of how the models are loaded into memory with this project? This is what I was referring to. They may be instantiated in different parts of the script, so it may be a bit more work, but you can try what I said below.
Also, you're running that torch call every frame which isn't necessary / can lead to some issues. Also, you don't need to use user input to go to the next iteration. It probably runs out of memory because it's still executing in the background while waiting for your input. Try this instead (untested as I'm away from my machine).
# Add these two lines above the for loop.
current_frame = 0
max_frame = 14
for frame_index in tqdm(range(frame_count)):
ret, frame = video.read()
if ret:
# If ret returns true, increment the current_frame counter by 1.
current_frame += 1
# if the current frame count equals the max frame count, do something.
if current_frame == max_frame:
# Let's empty the cache.
torch.cuda.empty_cache()
# Reset the counter back to 0.
current_frame = 0
The user input is just for some testing purpose after 98 frames. Resetting frame index to 0 would process the same part of the input video and overwrite the temporary image sequence... I think I found a solution how to process the whole input via batch and some additional parameters in video_swapsingle.py (start frame, end frame) without the need to split it into shorter parts. But I have a daytime job now.....
Resetting frame index to 0 would process the same part of the input video and overwrite the temporary image sequence...
Read my proposed code again please. It's not about setting the frame index to 0, it's about creating a separate counter variable that increases a certain amount of times in the loop, and once it hits a certain limit (max_frame), the counter resets.
You said that your GPU runs out of memory every 15th frame or so. In theory, clearing your GPU cache via torch's methods (which may not work) or doing something to alleviate GPU resources every 15 frames would prevent you from doing all of the extra steps you've mentioned.
Looks like i accidently got it to work on 2GB GPU VRAM....
Not the way I initially planned .... but it works. No problem processing testvideo duration 23 sec. / 1396 frames.
As soon as I have cleaned up the code I will post the changes I've made. (test_video_swapsingle.py / videoswap.py / test_options.py)
EDIT: I will write a simple GUI (VB6 :-)
Here are the changes I made to run SimSwap on 2GB VRAM:
_./options/testoptions.py
self.parser.add_argument("--first_frame", dest="first_frame", type=int, default=0, help="Set frame to start from.")
./util/videoswap.py . . . from util.add_watermark import watermark_image #frame_index = 0 first_frame = 0 . . . def video_swap(first_frame , video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo = False): . . . for frame_index in tqdm(range(first_frame,frame_count)): torch.cuda.empty_cache() ret, frame = video.read() if frame_index == 1: break . . . video.release() if frame_index > 1: image_filename_list = [] path = os.path.join(temp_results_dir,'*.jpg') image_filenames = sorted(glob.glob(path)) clips = ImageSequenceClip(image_filenames,fps = fps)
_test_videoswapsingle.py . . . first_frame = 0 video_swap(first_frame, opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,no_simswaplogo=opt.no_simswaplogo)
first_frame = 2 video_swap(first_frame, opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,no_simswaplogo=opt.no_simswaplogo)
_test_videoswapsingle.py calls video_swap in ./util/videoswap.py and processes the first 2 frames before the break then it calls video_swap again, starting at frame 2 and runs until the end of the input file. Tested so far on 2600 frames but there seems to be no limit. torch.cuda.empty_cache() clears the VRAM before processing every single frame...
Not perfect but (working for me).....
Just found a more simple solution for "cuda out of memory" problem while running SimSwap on 2GB GPU:
I only insert a with torch.no_grad(): command in ../util/videoswap.py between lines 48 and 49 (and add 4 more spaces indent to every following line from 49 to 84)
and it works perfect
Just found a more simple solution for "cuda out of memory" problem while running SimSwap on 2GB GPU:
I only insert a with torch.no_grad(): command in ../util/videoswap.py between lines 48 and 49 (and add 4 more spaces indent to every following line from 49 to 84)
and it works perfect
OMG, I forgot to add this .I will make it done in the next update.
:-) Came across it making some more mods to first order motion model and co-part segmentation
Since I got it to work on my GForce 1050GTX / 2GB , at least for videos not longer than ~ 16 frames, before the GPU runs out of memory I wonder if there is also a limitation for using a 8 GB GPU ?
I had the same problem using Wav2Lip, but it could be solved by setting the chunk size to 1.
Would it (theoretically) be possible to process videos in SimSwap in smaller parts or chunks by releasing GPU memory every 15 frames ?