iperov / DeepFaceLab

DeepFaceLab is the leading software for creating deepfakes.
GNU General Public License v3.0
46.3k stars 10.39k forks source link

Solved for me: 2080 ti crashing on Train or Merge #5486

Open samfisherirl opened 2 years ago

samfisherirl commented 2 years ago

THIS IS NOT TECH SUPPORT FOR NEWBIE FAKERS POST ONLY ISSUES RELATED TO BUGS OR CODE

Expected behavior

Taking 20 seconds of video, and transcribing another 20 second video with an individuals face. Training the AI.

Actual behavior

Crashing upon attempting to train or merge, BSOD with "DPC WATCHDOG VIOLATION" Similar to about 1/4 of the posts listed below mine.

Steps to reproduce

Taking two videos, successfully going through steps 1-5 without issue. Extracting faces, manually or automatically. If I attempted any steps towards train or merge, I was met with BSOD. Quick, or any other.

Other relevant information

-2080 ti ROG STRIX Asus -i7 8700k 5ghz -32gb ram -Windows 10 fresh install 1 month

After an hour of reading through the git and forums, I found the ability via underclocking my card and setting limits within the application.

My card was overclocked in MSI Afterburner, even without the overclock the system would crash. CPU was reset to standard clock, that didn't help. Running only CPU wasnt something I was interested in testing due to time constraints.

image

Upon underclocking my memory, core, and power, I was able to maintain a stable outcome.

I also kept taskers at 8/12? I assume when they ask that its relating either to cores or VRAM.

I can only assume the application Deepface isn't accounting for my limited VRAM and is using all of it. That seems to be most likely. If thats right, patches should limit the ceiling VRAM, whatever is maxing out system on the gpu. That, or have heavy warnings for the upper bound.

samfisherirl commented 2 years ago

after further investigation:

my i7 8700k is delidded, and water cooled by a icue 150 and it stays under 70c. 1.35v

I had crashing on denoise_data_dst_images after ffmpeg.exe maxed out my CPU. After raising my voltage to 1.36v I was able to prevent the crash and keep temps relatively similar, while overclocking my GPU to 125%.

Still cant get it to train. Got it past the denoise phase. I've got it to train before. Still unsure how to consistently get it to work.

joolstorrentecalo commented 1 year ago

Did you ever find the answer? If so, would you mind sharing it and closing this issue?