princeton-vl / pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
BSD 3-Clause "New" or "Revised" License
373 stars 76 forks source link

Some predicted results contains annotations of people number more than max_num_people #23

Open ahangchen opened 6 years ago

ahangchen commented 6 years ago

Problem

I found the some predicted results contains annotations of people number more than max_num_people defined in the task.

Cause

Firstly, although function calc only extracts topk activations from each joint heatmap, these activation need to be matched by tags to get the final predicted people number.

However, people number will be larger than activations in single joint heatmap. For example, nose heatmap has 27 activations higher than detection threshold, while eye heatmap has 28 activations higher than detection threshold. If only 20 of them can be matched (they only match each other when their tags are closed enough), 7 activations are remained in nose heatmap, 8 in eye heatmap. So we got 20+7+8=35 persons in eye iterations as dic and dic2 increase.

Solution

I notice that this line try to ignore joints matching after tags reach max_num_people. But it's a mistake to use len(actualTags) == params.max_num_people because len(actualTags) may increase by more than 1 in one joint iteration, 27->35 as the example above showing.

What's more, when there are lots of people in the images, this condition judgement will miss all the keypoints in lower half of body for all people. When you reach max_num_people in eye, you won't append more joints into dic.

So I think that's not a good idea to ignore joints matching after tags reach max_num_people, by simply modifying len(actualTags) == params.max_num_people to len(actualTags) >= params.max_num_people or someway else. By the way, this modification will still produce some results with people numbers slightly larger than max_num_people.

As a result , I suggest to place this statement before we change dic and dic2 to increase persons.

if row<diff2.shape[0] and col < diff2.shape[1] and diff2[row][col] < params.tag_threshold:
    dic[actualTags_key[col]][ptIdx] = joints[row]
    dic2[actualTags_key[col]].append(tags[row])
else:
    if params.ignore_too_much and len(list(dic.keys())) == params.max_num_people:
        continue
    key = tags[row][0]
    dic.setdefault(key, np.copy(default_))[ptIdx] = joints[row]
    dic2[key] = [tags[row]]

So I create a PR to fix this problem.

ahangchen commented 6 years ago

@anewell

lck1201 commented 5 years ago

@ahangchen Hi, thanks for you careful thought! Could you answer my simple question that why does system restrict the max number of people? Does that mean if we have people in image more than max_num_people, the system can only predict max_num_people people at most or slightly more?

And as for other tasks, let's say I want to detect windows in a building image by estimating four corners. So, do I have to go through the dataset, and find max_num_windows?