Closed johndpope closed 3 months ago
B.3 Datasets preprocessing We obtain the VoxCeleb2HQ dataset by first downloading the original videos from the VoxCeleb2 [4 ] dataset. These videos are processed using an off-the-shelf face [41 ] and keypoints [ 2] detectors and cropped frame-by-frame around the head regions. Then, the obtained cropped frames are first filtered by their resolution, t
this data prep step feels out of date. were the keypoints calculated per frame?
could a more modern approach be swapped in https://github.com/search?q=repo%3AZejun-Yang%2FAniPortrait%20lmk&type=code
does VASA follow same approach?
i upgrade the video pre processing to remove background - and other transforms. the ffhq is ok for now.
I think the original codebase is closely related to this https://github.com/SamsungLabs/rome?tab=readme-ov-file
https://github.com/search?q=LMDBDataset&type=code