Closed yuminnko closed 6 months ago
Hi @yuminnko.
As mentioned in the paper, the statistics on WebVid represent the L1 distance between each frame and the conditional frame in the HSV color space.
In practice, statistics have the shape of [frame_nums, 2], which contains the frame-wise minimum and maximum distances from WebVid.
You can use this function on the WebVid to get statistics.
Hi, @ymzhang0319 Thanks for the answer.
But, I'm still curious about the statistics. For the shape of statistics to satisfy [frame_nums, 2], is it same [min, max] value repeated for number of video length?
@yuminnko
It's not repeat. Through computation, each frame has a different distance statistic with condition frame.
@ymzhang0319
How can each frame has a different maximum and minimum value? I thought the minimum and maximum value represents min / max from L1 distances between each frame and the conditional frame.
@yuminnko
The maximum and minimum are relative to all videos in webvid. (e. g. Videos with large variations have more distance per frame.)
Thanks !
https://github.com/open-mmlab/PIA/blob/main/animatediff/utils/util.py#L262
Hi, Thanks for the work ! How can I get any information about the 'statistics' used in function 'prepare_mask_coef_by_score'? I only found this sentence in the paper.
can you provide some more details about statistics?