magic-research / magic-animate

[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
https://showlab.github.io/magicanimate/
BSD 3-Clause "New" or "Revised" License
10.5k stars 1.08k forks source link

Where are your released checkpoints trained #136

Closed yyyouy closed 10 months ago

yyyouy commented 10 months ago

Thanks for your excellent work. I would like to ask, where are your released checkpoints trained? my evaluation results (fid, fvd) on the tiktok dataset are quite different from the results you published.

I wanted to inquire if the released checkpoint is not the checkpoint you used in the tiktok results on your paper.

zcxu-eric commented 10 months ago

Hi, we do release a different checkpoint using different hyperparameters which generalizes better in this repo.

forechoandlook commented 10 months ago

Hi, we do release a different checkpoint using different hyperparameters which generalizes better in this repo.

we find that is not "better" but worse

zcxu-eric commented 10 months ago

Hi, we do release a different checkpoint using different hyperparameters which generalizes better in this repo.

we find that is not "better" but worse

I mean the ckpt we released can work better for ood reference image but its numerical results on TikTok benchmark could be worse.

Worromots commented 10 months ago

Hi, we do release a different checkpoint using different hyperparameters which generalizes better in this repo.

we find that is not "better" but worse

I mean the ckpt we released can work better for ood reference image but its numerical results on TikTok benchmark could be worse.

Thanks for your excellent work. Will you realse the Sup. Mat. recently?

Worromots commented 10 months ago

Thanks for your excellent work. I would like to ask, where are your released checkpoints trained? my evaluation results (fid, fvd) on the tiktok dataset are quite different from the results you published.

I wanted to inquire if the released checkpoint is not the checkpoint you used in the tiktok results on your paper.

Hi, I am working on reproducing the FVD too. How do you process the TikTok Dataset? I find only about 15 videos are used as the test split in other Paper, for example DISCO. Could you tell me about the details about you test pipeline? Looking forward you reply!

zcxu-eric commented 10 months ago

@Worromots Hi, I believe this issue #133 is also related. We used evaluation code released by DisCo and followed their train/test split for TikTok dataset.

Worromots commented 10 months ago

@Worromots Hi, I believe this issue #133 is also related. We used evaluation code released by DisCo and followed their train/test split for TikTok dataset.

Thank you for your answer; it has resolved a question that has puzzled me for many days. I have another question: for the test set of the TED dataset, did you generate only the first n frames during testing, or did you generate all the frames? For example, for a clip with 200 frames, do you need to generate all 200 frames during testing, or just the first 32 frames?

zcxu-eric commented 10 months ago

@Worromots Hi, I believe this issue #133 is also related. We used evaluation code released by DisCo and followed their train/test split for TikTok dataset.

Thank you for your answer; it has resolved a question that has puzzled me for many days. I have another question: for the test set of the TED dataset, did you generate only the first n frames during testing, or did you generate all the frames? For example, for a clip with 200 frames, do you need to generate all 200 frames during testing, or just the first 32 frames?

We generate all the frames for testing.

yyyouy commented 10 months ago

I mean the ckpt we released can work better for ood reference image but its numerical results on TikTok benchmark could be worse.

Sorry, I found that the numerical results on the Tiktok benchmark are far away from the results you mentioned in your paper and this results have puzzled me for many days, can you release the results on Tiktok or Ted-talk achieved by your released checkpoint.

zcxu-eric commented 10 months ago

I mean the ckpt we released can work better for ood reference image but its numerical results on TikTok benchmark could be worse.

Sorry, I found that the numerical results on the Tiktok benchmark are far away from the results you mentioned in your paper and this results have puzzled me for many days, can you release the results on Tiktok or Ted-talk achieved by your released checkpoint.

Hi, I'm not sure whether your evaluation codebase is the same as ours or not since we haven't released this part yet. We will release all the ckpts and codebase in the future, please stay tuned, thx.

Delicious-Bitter-Melon commented 10 months ago

I mean the ckpt we released can work better for ood reference image but its numerical results on TikTok benchmark could be worse.

Sorry, I found that the numerical results on the Tiktok benchmark are far away from the results you mentioned in your paper and this results have puzzled me for many days, can you release the results on Tiktok or Ted-talk achieved by your released checkpoint.

Hi, I'm not sure whether your evaluation codebase is the same as ours or not since we haven't released this part yet. We will release all the ckpts and codebase in the future, please stay tuned, thx.

Hi, what is the resolution of TED/TiKTOK dataset you evaluate? 512px or 256px or 384px?

zcxu-eric commented 10 months ago

We use 512x512 for evaluation.

Delicious-Bitter-Melon commented 10 months ago

We use 512x512 for evaluation.

Thanks for your reply. Do you only use the first 100 frames of each video in TED/TiKTOK test dataset to extract poses and perform inference?

Worromots commented 10 months ago

{'FVD-3DRN50': 44.1599514373319, 'FVD-3DInception': 305.5606416223029} FID: 48.077198307792685 This is the result I reproduced using the gen_val.sh script from the DISCO codebase, where I generated all frames for evaluation.