Evalualuation Methods - Githubissues

hobbitsyfeet commented 2 years ago

To evaluate the methods, we need to be done in multiple takes and will be done in this order.

1: Initialize tracker - allow the tracker to track the person indefinitely. 2: Human intervention - Allow the user to intervene as freely as possible to ensure quality data.

Human intervention with Kahlman assistance - same as step 2: but with the Kalman filter enabled to default settings. (Balancing? do we switch 2: and 3:, half the time we do 2 first, other half we do 3 first per user. Answer: NO, don't make it more complicated. This is a CS. Methods paper, not a Psychology paper.)

---- Video prediction required for the next steps ----

Human intervention with the predicted assignments - Enable the user to make changes with the default settings - allow the model to assign changes.
Evaluate the model predictions with the Nearest Neighbour and Kalman filter. This will produce 6 results for each video. (100% hands-free) https://github.com/hobbitsyfeet/PeopleTracker/commit/839816c1543d45fd4a1658daf1d4c1416572b975

NOTE: This requires flags for every intervention and decision made. https://github.com/hobbitsyfeet/PeopleTracker/commit/f713602dd87137ec84ad5f9fe9f0f64d048314df

Human reassignment (when and where from, where to)
Kalman Filter pause, Model pause, Model re-assignment

Question: Does this account for human variability? I think recording when the person has made changes can explain this. This also shows whether the models make significant changes.

Question: How do we measure the effectiveness of the tool over the effectiveness of the human intervention?

Possible answer: Look at when the tool pauses, compare it to other rounds where the human-made the change only, vs when the tool made the change. I.E a person fails to pause, and the tool caught it, the tool pauses and it makes an insignificant difference in comparison to when the person has no help. Does the tool negatively affect performance? Frequent ineffective changes or bad auto assignments from the model.

Mir09 commented 2 years ago

Ideas for data collection:

OUTPUT VARIABLES TrackerAdjustments TimeAdjustments ObjectPurity FalseNeg FalsePos MultipleTrackers MultipleObjects ConfigurationDistance TrackerPurity ObjectPurity FalselyIdentifiedTracker FalselyIdentifiedObject Recall Precision fmeasure

VIDEO FEATURES NumIndivInFrame NumInidivLeavingEntering OpticalFlow NumOccludObj VideoLength VideoID VideoLocation Lighting

Mir09 commented 2 years ago

Definitions

We will want these for the DAG and when we write up the paper. E is for estimate, and GT is for ground truth.

Tracker Purity: frames where the E (prediction) correctly identifies the GT. This is accumulated over all the trackers, over all the frames. The result is the percent correct per frame, summed, and divided by the number of frames.

Object Purity: where GT identifies correctly the estimate, also as a ration as described above.

Configuration Distance: the difference between the number of Es and GTs normalized by the number of GTs in a given frame. This is a negative value when N(GT) > N(E) (N is number of); positive value when N(GT) < N(E)

Multiple Trackers: 2 or more E are associated with the same GT. MT error is assigned for each excess estimate.

Multiple Objects: 2 or more GT objects are associated with the same estimate. MO error is assigned for each excess GT.

Precision: measures how much of the E covers the GT. This can have a value between 0 (no overlap) and 1 (fully overlapped). This is the intersection area divided by the area of the GT.

Recall: measures how much GT covered by the E and can take values between 0 (no overall) and 1 (fully overlapped). Intersection area divided by the area of the E.

fmeasure: looks at both precision and recall at the same time, F = (2rp / r + p). This is high when both p and r are high.

False Negative: a GT exists that is not associated with an E. Both the GT and E need to have over 50% intersection in order to count as "tracked" or it will return as a FN.

False Positive: an E exists that is not associated with a GT. (50% rule described above also applies)

Falsely Identified Object: a GT segment which passes the coverage test (>50% intersection between E and GT) for the E but is not the identified GT, i.e., identity switch.

Falsely Identified Tracker: an E segment (which passes the coverage test for GT) but is not the identifying E.

Example E GT FIT A -> 2 A <-> 1 FIO 2 -> A B <-> 2 (This is how the identify maps are made) C <-> 3

Number of Occluding Objects: still in progress but essentially this is, did one tracker occlude another one Y/N

Occlusion is when 2 trackers overlap by 80%, raises an occlusion flag. If this happens, multiple tracker and multiple objects are negated. !

Mir09 commented 1 year ago

Note: When generating ground truths, we did not track people outside the "tracked area", for instance in a secondary room. We are not able to be certain of IDs outside what we have determined is the tracked area. Further, for occluding objects, we are going to assume the continued size of the person. This is because we are able to use "keep previous annotation" to estimate size, and the PeopleTracker functions similarly. This is slightly different than what we do with occluding walls, where we only track as much of the person as we can see.

hobbitsyfeet / PeopleTracker

Evalualuation Methods #22