what is the face data? - Githubissues

zhangvia commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

zhangvia commented 1 month ago

just detect the face in the image,and crop the face?

luxiaolili commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop

zhangvia commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop

thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72])

nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?

luxiaolili commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop

thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72])

nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?

change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...

zhangvia commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop

thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72]) nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?

change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...

did you activate gradient-checkpoint? in the paper, author use a100(40g) to train the model, and the frames seems to be pretty big? i'm curious how did he do that

xiaohutongxue-sunny commented 1 month ago

The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?

the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop

thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72]) nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?

change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...

did you activate gradient-checkpoint? in the paper, author use a100(40g) to train the model, and the frames seems to be pretty big? i'm curious how did he do that

time replace space.

luxiaolili / MimicMotion_train

what is the face data? #3