I'm using a modified version of this project as a part of another project where talking head synthesis is a component. I'm curious if anyone here (author included) has any recommendations for speeding up the actual talking head synthesis itself, especially if it's used to augment, for example, tens of thousands of videos. I've achieved some moderate success just splitting up computation into different processes (given multiple processes running on GPUs with a larger memory pool, such as 48GB), but this is a bit clunky and also appears to mess up some videos by making them choppy at certain segments once stitched back together from multiple processes. Perhaps I missed it, but I don't really see any parameters to modify an inference-time batch size, or anything like that, within the code itself.
I'm using a modified version of this project as a part of another project where talking head synthesis is a component. I'm curious if anyone here (author included) has any recommendations for speeding up the actual talking head synthesis itself, especially if it's used to augment, for example, tens of thousands of videos. I've achieved some moderate success just splitting up computation into different processes (given multiple processes running on GPUs with a larger memory pool, such as 48GB), but this is a bit clunky and also appears to mess up some videos by making them choppy at certain segments once stitched back together from multiple processes. Perhaps I missed it, but I don't really see any parameters to modify an inference-time batch size, or anything like that, within the code itself.
Any other ideas?