Open Lethobenthos20 opened 2 months ago
Thanks for your attention! Please refer to https://github.com/zhangguiwei610/CAMEL/blob/c8fe6f9c7240870ee7af2f41f667ef769026a25a/train_camel.py#L438
OK, Thank you, I have noticed it, but can you point out the relevant code on how "ddim_inv_latent" is used?
Why is this in the inference phase, where sampling in the code starts directly with the noise and not after the inversion? Because it is not seen that "ddim_inv_latent" is used in inference. Or is there something wrong with my understanding?
In https://github.com/zhangguiwei610/CAMEL/blob/c8fe6f9c7240870ee7af2f41f667ef769026a25a/train_camel.py#L444, we can see that the "ddim_inv_latent" is input into the "validation_pipeline" function, and In https://github.com/zhangguiwei610/CAMEL/blob/c8fe6f9c7240870ee7af2f41f667ef769026a25a/tuneavideo/pipelines/pipeline_tuneavideo.py#L298, since the latent is not None, the sampling starts directly with the ddim_inv_latent.
OK, thank you very much for your answer, I've got it. Thank you again.
Didn't find the code for DDIM inversion, but DDIM sampling directly from noise, why is that?
@torch.no_grad() def call( self, prompt: Union[str, List[str]], motion_prompt: None, video_length: Optional[int], height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_videos_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, output_type: Optional[str] = "tensor", return_dict: bool = True, callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback_steps: Optional[int] = 1, **kwargs, ):
Default height and width to unet