Open zhangvia opened 1 month ago
just detect the face in the image,and crop the face?
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?
the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?
the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop
thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72])
nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?
the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop
thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72])
nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?
change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?
the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop
thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72]) nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?
change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...
did you activate gradient-checkpoint? in the paper, author use a100(40g) to train the model, and the frames seems to be pretty big? i'm curious how did he do that
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?
the author use his own dataset and test on tiktok, I train on tiktok, I found that the some frame never has good person, so i add face detection, and the face folder only have the face frame like imgs as a reference image, not crop
thank you, and by the way, i use the ubc fashion data, and the dataset shape are like: torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 25, 3, 1024, 576]) torch.Size([2, 3, 224, 224]) torch.Size([2, 3, 1024, 576]) torch.Size([50, 320, 128, 72]) nothing change in train.sh, i use the a800 which has 80g vram. there is still oom, is that normal?
change the frames to 16, the image size is 1024 * 576 , only 69G vram, if the frames is 25, oom...
did you activate gradient-checkpoint? in the paper, author use a100(40g) to train the model, and the frames seems to be pretty big? i'm curious how did he do that
time replace space.
The paper's Section 3.2 does not mention using face data as a reference image; it states that the reference image input is randomly sampled from the video sequence. so what is face folder in your script?