open-mmlab / PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画
https://pi-animator.github.io/
Apache License 2.0
913 stars 75 forks source link

about statistics in 'prepare_mask_coef_by_score' #42

Closed yuminnko closed 6 months ago

yuminnko commented 6 months ago

https://github.com/open-mmlab/PIA/blob/main/animatediff/utils/util.py#L262

Hi, Thanks for the work ! How can I get any information about the 'statistics' used in function 'prepare_mask_coef_by_score'? I only found this sentence in the paper.

can you provide some more details about statistics?

ymzhang0319 commented 6 months ago

Hi @yuminnko.

As mentioned in the paper, the statistics on WebVid represent the L1 distance between each frame and the conditional frame in the HSV color space.

In practice, statistics have the shape of [frame_nums, 2], which contains the frame-wise minimum and maximum distances from WebVid.

You can use this function on the WebVid to get statistics.

yuminnko commented 6 months ago

Hi, @ymzhang0319 Thanks for the answer.

But, I'm still curious about the statistics. For the shape of statistics to satisfy [frame_nums, 2], is it same [min, max] value repeated for number of video length?

ymzhang0319 commented 6 months ago

@yuminnko

It's not repeat. Through computation, each frame has a different distance statistic with condition frame.

yuminnko commented 6 months ago

@ymzhang0319

How can each frame has a different maximum and minimum value? I thought the minimum and maximum value represents min / max from L1 distances between each frame and the conditional frame.

ymzhang0319 commented 6 months ago

@yuminnko

The maximum and minimum are relative to all videos in webvid. (e. g. Videos with large variations have more distance per frame.)

yuminnko commented 6 months ago

Thanks !