luxiaolili / MimicMotion_train

Other
6 stars 1 forks source link

details of loss #2

Open xiaohutongxue-sunny opened 1 month ago

xiaohutongxue-sunny commented 1 month ago

Hello friends, i am appreciated for your work, it gives me motivation to research again. There is a question i hope you could share: what does the follow content meaning, would mind explaining the details of characters(c_skip , c_out) and the equal's meaning 2024-08-14 10-29-05 的屏幕截图 Thank you very much

xiaohutongxue-sunny commented 1 month ago

according to the url, i understand the meaning! but i find that your code doesn't accomplish the "confidence-aware pose guidance"

luxiaolili commented 1 month ago

according to the url, i understand the meaning! but i find that your code doesn't accomplish the "confidence-aware pose guidance"

It is not hard to do, just generate the hand mask of some threshold for the loss. It is need for get the threshold for blur hand , I will realize it these days and rewrite the code

xiaohutongxue-sunny commented 1 month ago

according to the url, i understand the meaning! but i find that your code doesn't accomplish the "confidence-aware pose guidance"

It is not hard to do, just generate the hand mask of some threshold for the loss. It is need for get the threshold for blur hand , I will realize it these days and rewrite the code

OK! Thank you, friend!i will try to do. Thanks for your share.

trouble-maker007 commented 1 month ago

@luxiaolili @xiaohutongxue-sunny In the paper, the term "Confidence-aware Pose Guidance" seems to involve multiplying the threshold by the corresponding body part colors. Is this also uniformly processed as a mask, or is it done by increasing the brightness of the corresponding body part colors through the threshold, for example, by performing such processing in the dataset?

I am also replicating this paper, and I have some doubts about this part.

zhangvia commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

xiaohutongxue-sunny commented 1 month ago

@luxiaolili @xiaohutongxue-sunny In the paper, the term "Confidence-aware Pose Guidance" seems to involve multiplying the threshold by the corresponding body part colors. Is this also uniformly processed as a mask, or is it done by increasing the brightness of the corresponding body part colors through the threshold, for example, by performing such processing in the dataset?

I am also replicating this paper, and I have some doubts about this part.

首先从dwpose结果得到高置信度手部框的Mask,然后把这个Mask下采样到latent的大小。对于Mask区域使用不同的weight,乘在mean之前的loss上。 i don't know who write this , i got this from official git

xiaohutongxue-sunny commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu decrease the number of frame and input size , try 512 size and 12 frame number

luxiaolili commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

xiaohutongxue-sunny commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

there is an bug in your dataset :img_path_lst = sorted([img for img in glob(img_path +"/*.png")]) this will return a list but not same as you want . the result is ['1.png', '10.png', '100.png', '101.png', '102.png', '103.png', '104.png', '105.png', '106.png', '107.png', '108.png', '109.png', '11.png', '110.png'] not 1,2,3,4,5,6,7,8,9,10...

xiaohutongxue-sunny commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

how many videos are trained

luxiaolili commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

there is an bug in your dataset :img_path_lst = sorted([img for img in glob(img_path +"/*.png")]) this will return a list but not same as you want . the result is ['1.png', '10.png', '100.png', '101.png', '102.png', '103.png', '104.png', '105.png', '106.png', '107.png', '108.png', '109.png', '11.png', '110.png'] not 1,2,3,4,5,6,7,8,9,10...

i rewrite the dataset, not this code, if the result is ok, i will upload it

luxiaolili commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

there is an bug in your dataset :img_path_lst = sorted([img for img in glob(img_path +"/*.png")]) this will return a list but not same as you want . the result is ['1.png', '10.png', '100.png', '101.png', '102.png', '103.png', '104.png', '105.png', '106.png', '107.png', '108.png', '109.png', '11.png', '110.png'] not 1,2,3,4,5,6,7,8,9,10...

split the video to images like ['00000.png', '00001.png']

xiaohutongxue-sunny commented 1 month ago

hey, what gpu did you guys use? i am using the a800 which has 80GB vram, but still get oom. i'm using ubc fashion datatset. theoratically, the unet and posenet only has 1.5B params which will consume about 14GB vram when use adam8bit optimizer. i use gradient checkpoint to eliminate the intermediate activations,it's really abnormal that the training get oom on 80GB gpu

i am using the a800 train it with 16 frames, the image size is 1024 *576, it need 69GB vram, I have train the first result for it, the video generated is so flicker, so I do some experiment for it

there is an bug in your dataset :img_path_lst = sorted([img for img in glob(img_path +"/*.png")]) this will return a list but not same as you want . the result is ['1.png', '10.png', '100.png', '101.png', '102.png', '103.png', '104.png', '105.png', '106.png', '107.png', '108.png', '109.png', '11.png', '110.png'] not 1,2,3,4,5,6,7,8,9,10...

split the video to images like ['00000.png', '00001.png']

good idea

trouble-maker007 commented 1 month ago

@luxiaolili @xiaohutongxue-sunny In the paper, the term "Confidence-aware Pose Guidance" seems to involve multiplying the threshold by the corresponding body part colors. Is this also uniformly processed as a mask, or is it done by increasing the brightness of the corresponding body part colors through the threshold, for example, by performing such processing in the dataset? I am also replicating this paper, and I have some doubts about this part.

首先从dwpose结果得到高置信度手部框的Mask,然后把这个Mask下采样到latent的大小。对于Mask区域使用不同的weight,乘在mean之前的loss上。 i don't know who write this , i got this from official git

This response explains the method for handling the hands, which simply involves adding a bounding box (bbox) around keypoints that exceed a certain threshold, and then turning the entire bbox into a mask. The intensity of the mask is based on the corresponding threshold.

Key features of confidence-aware pose guidanceinclude: 1) The pose sequence condition is accompanied by keypoint confidence scores, enabling themodel to adaptively adjust the influence of pose guidance based on the score. 2) The regions with high confidence are given greater weight in the loss function, amplifying their impact in training.

The two treatments mentioned here should not be the same. I am puzzled about how to change the color of the limbs through the threshold to achieve confidence-aware pose.

xiaohutongxue-sunny commented 1 month ago

Key features of confidence-aware pose guidanceinclude: 1) The pose sequence condition is accompanied by keypoint confidence scores, enabling themodel to adaptively adjust the influence of pose guidance based on the score. 2) The regions with high confidence are given greater weight in the loss function, amplifying their impact in training. the easy method is 255* confidence, the smaller the lower light

luxiaolili commented 1 month ago

@luxiaolili @xiaohutongxue-sunny In the paper, the term "Confidence-aware Pose Guidance" seems to involve multiplying the threshold by the corresponding body part colors. Is this also uniformly processed as a mask, or is it done by increasing the brightness of the corresponding body part colors through the threshold, for example, by performing such processing in the dataset? I am also replicating this paper, and I have some doubts about this part.

首先从dwpose结果得到高置信度手部框的Mask,然后把这个Mask下采样到latent的大小。对于Mask区域使用不同的weight,乘在mean之前的loss上。 i don't know who write this , i got this from official git

This response explains the method for handling the hands, which simply involves adding a bounding box (bbox) around keypoints that exceed a certain threshold, and then turning the entire bbox into a mask. The intensity of the mask is based on the corresponding threshold.

Key features of confidence-aware pose guidanceinclude: 1) The pose sequence condition is accompanied by keypoint confidence scores, enabling themodel to adaptively adjust the influence of pose guidance based on the score. 2) The regions with high confidence are given greater weight in the loss function, amplifying their impact in training.

The two treatments mentioned here should not be the same. I am puzzled about how to change the color of the limbs through the threshold to achieve confidence-aware pose.

the author's dwpose code change the color by alpha_blend_color in dwpose/util.py, the color of the limbs multiply the conf of the pose

trouble-maker007 commented 1 month ago

@xiaohutongxue-sunny Thank you for the reply, the hand mask can be processed by referring to this.

luxiaolili commented 1 month ago

@xiaohutongxue-sunny Thank you for the reply, the hand mask can be processed by referring to this.

Thanks for share this code, like their experiments, It is need great data, I'm using the tiktok dataset , the train result is so flicker, check the tiktok, it is so blur, so I have to collect some other data from webset

xiaohutongxue-sunny commented 1 month ago

@xiaohutongxue-sunny Thank you for the reply, the hand mask can be processed by referring to this.

Thanks for share this code, like their experiments, It is need great data, I'm using the tiktok dataset , the train result is so flicker, check the tiktok, it is so blur, so I have to collect some other data from webset

use UBC_fashion dataset

luxiaolili commented 1 month ago

@xiaohutongxue-sunny Thank you for the reply, the hand mask can be processed by referring to this.

Thanks for share this code, like their experiments, It is need great data, I'm using the tiktok dataset , the train result is so flicker, check the tiktok, it is so blur, so I have to collect some other data from webset

use UBC_fashion dataset

I am trying and appearence_net to control the face , I think the ref feature is so weak

xiaohutongxue-sunny commented 1 month ago

@xiaohutongxue-sunny Thank you for the reply, the hand mask can be processed by referring to this.

Thanks for share this code, like their experiments, It is need great data, I'm using the tiktok dataset , the train result is so flicker, check the tiktok, it is so blur, so I have to collect some other data from webset

use UBC_fashion dataset

I am trying and appearence_net to control the face , I think the ref feature is so weak

if i were you, i will try to collect data, high quality data is more important,

zhangvia commented 3 weeks ago

maybe you should increase the frame length, the 72 frame model is much better than 16 frame