Open KamiCalcium opened 3 years ago
It depends on how your dataset looks like. Could you give me some image examples? Does it contain motion capture studio images? in-the-wild images? synthetic images?
It depends on how your dataset looks like. Could you give me some image examples? Does it contain motion capture studio images? in-the-wild images? synthetic images?
Thanks for replying! It is from cityscape dataset. https://www.cityscapes-dataset.com/ Some researchers annotates all person bounding box information and call it cityperson dataset: https://github.com/cvgroup-njust/CityPersons
All images are shot in the real cities.
I see. I think option 1 would be the best one. By the way, is this dataset contain 3D pose annotations? if it contains only 2D, fine-tuning a model only on this dataset would not work as no 3D supervision can be applied.
I see. I think option 1 would be the best one. By the way, is this dataset contain 3D pose annotations? if it contains only 2D, fine-tuning a model only on this dataset would not work as no 3D supervision can be applied.
It does not contain, but I make a kind of self-supervised model: first I use the pre-trained model you provided to predict all the 3D poses in that dataset. And then I use some annotation modifying tools to modify some bad/outliers 3D pose. Now I use those corrected 3D pose annotations as the ground truth. And thank you for your suggestion.
By the way, I have some issues related to the code but I don't want to open another issue so I'm asking here.
I did the option1, retraining everything together. However, I sometime get loss nan for some of the samples in my dataset. When debugging, I found that for those nan loss, the corresponding joint_vis is not np.ones. Now I'm confused about what is joint_vis:
In https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/3f92ebaef214a0eb1574b7265e836456fbf3508a/data/Human36M/Human36M.py#L127, we set all the joint_vis to np.ones (This is from Human36M but I did the same thing for my dataset). However, in https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/3f92ebaef214a0eb1574b7265e836456fbf3508a/data/dataset.py#L74, we change it when loading it. What does joint_vis really do?
I actually take a look in each epoch and find the problem:
For those loss is nan, I found that not all the loss are nan, there are some samples out of 128 samples in the batch are nan (index 127 in this case): this loss_coord is after this line: https://github.com/mks0601/3DMPPE_POSENET_RELEASE/blob/3f92ebaef214a0eb1574b7265e836456fbf3508a/main/train.py#L50
The shape of loss_coord is (128, 18), 128 is the batch size and 18 is the joint number. Do you have any idea about this bug? What can be wrong about those samples? I cannot think of a good way to debug this.. (I will just ignore this nan rows for now and let it train, I don't know if that makes sense..)
joint_vis represents is a joint valid or not. If GT coordinates of a joint is not provided (or trunctated), you can mark joint_vis to zero.
Hi, It's a great work! I'm learning your code but confused about the unit of the loss. Is it pixel?
x,y: pixel z: discretized meter
Thanks for your replying! I have another confusion. I found that L1 distance of joint is used as loss in training process, but when the network is testing, L2 distance (that is, Euclidean distance) is used. Why do you use different method to measure the error between the predicted joint coordinates and the ground truth?
We empirically found that L1 loss works better than L2.
x,y: pixel z: discretized meter
Are the units of loss in training and testing the same? In the training, the joints' x coordinate and y coordinate are in the pixel coordinate system. But for z coordinate, I found that in README file, which describes the quantities in "Human36M_subject_joint_3d.json", the unit of 173 joint coordinates in world coordinate system is milimeter. And z coordinate from .json file is directly used as z coordinate of joint_img when the network is trained. May I take it that the unit of loss is as follows: x, y: pixel, z: milimeter? In the testing, the joints' coordinates are in the camera coordinate system when it calculating the loss. May I take it that the unit of loss in testing is milimeter in x, y, z? Looking forward to your reply! Thanks!
May I take it that the unit of loss is as follows: x, y: pixel, z: milimeter? -> z: I discretize milimeters to 0~63 heatmap space.
May I take it that the unit of loss in testing is milimeter in x, y, z? -> yes.
Thanks very much!
Hi,
I am trying to use PoseNet for my own dataset. More specifically, I now use PoseNet trained on Human3.6M and MPII to test my own dataset and I got some preliminary result. But I want to improve it further and the first thing come to my mind is to fine-tune the network using my dataset. Do you have any suggestion or experience (for example, how many epochs is good, or should I freeze any layers' weight?) on fine-tuning the PoseNet?
I have three ideas now:
Which do you think make more sense? Thanks in advance for your time!