from the original repo it looks like there is a condition for running inference without a vkps_path, is there a way to do this without your comfy implementation? What would be the best approach to do supply audio and an image but without any kind of input video?
That's the use case I also have : just use a generated portrait and generated TTS audio as inputs, no video, like with Sad Talker.
V-Express seems to allow it in Scenario 2 example (the one with Taylor Swift result).
from the original repo it looks like there is a condition for running inference without a vkps_path, is there a way to do this without your comfy implementation? What would be the best approach to do supply audio and an image but without any kind of input video?