Masked Output - Githubissues

alecda573 commented 4 years ago

Could you please describe how you are generating the mask that you are multiplying the stage outputs by?

michalfaber commented 4 years ago

Sure. First of all what is being masked in a single image:

area annotated as crowd
person with a number of keypoints below a certain threshold (there may be multiple persons on a single image)
person is too close to the previously processed one (the list of persons on the image is being sorted by area. The biggest person is processed first (base_dataflow.py, line 184)

The pipeline looks something like this:

a. Get the polygon representing an annotated item that we want to mask (person/crowd). There is a convenient function in the cocotools: annToRLE. Take a look at line 190 in the base_dataflow.py

b. Collect all such polygons and attach them to the metadata representing a sample. base_dataflow.py, line 238

c. Generate a binary mask. The function dataflow_steps.py->gen_mask contains the code where the binary mask is calculated. There is a function maskUtils.decode(seg) that generates a binary mask. Its parameter is a polygon encoded as RLE. You probably noticed that this function is not used in the code. I realized that I accidentally deleted the line: df = MapData(df, gen_mask) from get_dataflow_vgg function. Probably the result of some experiments. Sorry if that confused you. Will fix that soon.

d. The mask is scaled down to the required output size in the function dataflows.py->build_sample_with_masks.

alecda573 commented 4 years ago

@michalfaber Hey thanks for the clear explanation! I do not recall seeing this from the original implementation of OpenPose (maybe I am forgetting) did you decide to add this because it improved accuracy? Did you try implementing the model without applying the mask? Were the results vastly different?

michalfaber commented 4 years ago

Hi @alecda573 The original implementation only uses masking of areas marked as a crowd, so yes, that's my addition to that. But honestly, I haven't noticed any significant improvement. According to the paper, masking allows you to increase accuracy by about 2-3%. I trained the model without masks and the final results were not much worse. The training was obviously faster, which was my priority.

alecda573 commented 3 years ago

@michalfaber Thanks again for the clear explanation. I was messing around updating the weights that you have here, on a different dataset and I did not use masks as that information was not available in my dataset, but I was not really seeing much validation input so figured I would check to see how necessary it was.

michalfaber / tensorflow_Realtime_Multi-Person_Pose_Estimation

Masked Output #18