Closed vvasco closed 5 years ago
We plan to compare the response of the entire pipeline to the following three movements:
For each movement, we will test the following situations:
We will test the response offline, in order to have the exact same movement to elicit the two pipelines. The comparison will be done in terms of:
Let's store this report within GitHub as a markdown or juypter notebook.
The improvements that Y1Q3 introduces in terms of acquisition are listed below:
The disparity map provided by the camera is inaccurate around the human contour and might have holes, leading keypoints close to the contour (e.g. hands) or on an hole to be projected incorrectly. For example, the following video shows the depth map for an abduction movement, where the hand is projected to infinite:
In Y1M5, the disparity map is used as provided by the camera, without any additional processing, and thus it is affected by the effect described. In Y1Q3, the disparity map is eroded (and thus the depth map dilated), which has the double effect of:
The following video shows the effect of filtering the depth map and keypoints falling within the correct depth:
Unprocessed depth (Y1M5) | Filtered depth (Y1Q3) | Keypoints inside the depth |
---|---|---|
|
|
|
Exercises that require movements parallel to the optical axis make some keypoints ambiguous. For example, in the external and internal rotation, the elbow is not directly observable as the hand occludes it. Therefore both keypoints are projected to the same depth, as shown here:
Without optimization (Y1M5) |
---|
|
In Y1Q3, we introduce an optimization of the skeleton, which adjusts the depth of the keypoints such that the length of the arms is equal to that observed during an initial phase. The following image shows the result of the optimization, in which elbow and hand are projected correctly.
With optimization (Y1Q3) |
---|
|
yarpOpenPose
introduces a visible latency in the depth map, as shown here:
Without yarpOpenPose |
With yarpOpenPose |
---|---|
|
|
In Y1Q3, yarpOpenPose
propagates the depth image in sync with the output of skeleton detection, in order to equalize the delay between the two streams.
In Y1M5, the feedback is based on the definition of dynamic joints (i.e. joints performing the exercise (1)) and static joints (i.e. joints staying in the initial position (2)):
Dynamic and static scores are combined to produce three levels:
low: both dynamic and static scores are low, producing the following feedback:
"You are not moving very well!"
medium: either dynamic or static score is medium, producing the following feedback:
"You are doing the exercise correctly, but you could do it better."
high: both dynamic and static scores are high, producing the following feedback:
"You are moving very well!"
This kind of feedback is purely qualitative and has a series of drawbacks, which are:
To overcome the drawbacks of Y1M5, in Y1Q3 the concept of dynamic and static joints is removed and the exercise is treated in its entirety as an action. Therefore the feedback is produced based on the following two layers:
With this architecture, the feedback is extended and articulated in the following levels:
positive feedback:
"You are moving very well!"
feedback for range of motion exercises:
feedback on the speed:
"Move the arm faster/slower!"
feedback on the range of motion:
"Move the arm further up/down!"
feedback for reaching exercises:
"You are not reaching the target!"
negative feedback:
"You are doing the wrong exercise. Please, repeat the movements I show you."
The action recognition is carried out using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells.
Input to the network:
2D joints of the upper body from the skeletonRetriever
. The output of the skeletonRetriever
is preferable to that of yarpOpenPose
, as the skeletonRetriever
unambiguously identifies a skeleton by tag, avoiding ambiguities due to the presence of several skeletons.
Preprocessing: The following operations are applied to the skeleton:
Training set: 1 subject performs the following 6 exercises (i.e. 6 classes):
abduction-left
;internal-rotation-left
;external-rotation-left
;reaching-left
;static
(the subject remains steady);random
(the subject moves randomly).The first 3 movements are repeated 10 times and the full exercise repeated 5 times.
For the 4th movement, 4 targets to be reached are defined, distributed on the corners/center of a square, centered around the shoulderLeft
of the subject.
Each dataset was recorded from a frontal and a side view and can be found at this link.
Parameters used for training are the following:
n_hidden
= 22n_steps
(temporal window) = 30learning_rate
= 0.0005batch_size
= 256epochs
= 400Validation set: The same subject, but previously unseen data, were used for testing the network.
Accuracy: We get an accuracy of 92.2% with the following confusion matrix:
This analysis is differentiated according to the exercises, which are classified as:
The joints under analysis for these movements are: elbowLeft
, handLeft
.
These movements can produce two feedbacks, i.e. on the speed and on the range of motion. The feedback is provided according to a predefined hierarchy which prioritizes the speed, followed by the range of motion. Therefore, a positive feedback is produced only when both checks are fine.
Speed: Fourier analysis
We perform the Fourier transform of each component of the joints under analysis in a predefined temporal window, for both the observed and the template skeleton.
The difference in frequency is computed as df = f_skeleton - f_template
and thus we can have two possible cases:
df > 0
=> feedback: "Move the arm faster"df < 0
=> feedback: "Move the arm slower"Range of motion: Dynamic Time Warping (DTW) plus error statistical analysis
The DTW is applied to each component of the joints under analysis, for both the observed and the template skeleton, allowing us to temporally align the signals to compare.
Once joints are aligned, the error between the observed and template joints under analysis is computed.
A statistical analysis is carried out, which looks for tails in the error distribution. Tails can be identified using the skewness of the distribution.
Three cases can be identified:
The joint under analysis for this movement is handLeft
.
This movement produces a feedback related to how well a predefined target is reached.
The following tables compare feedbacks produced by the two pipelines developed in Y1M5 and in Y1Q3 respectively, in response to the same movement. Corrrect feedbacks are highlighted.
abduction-left
Y1M5 | Y1Q3 | |
---|---|---|
Correct | 1. You are moving very well! 2. You are moving very well! |
1. You are moving very well! 2. You are moving very well! |
Fast | 1.You are moving very well! 2.You are moving very well! |
1. Move the left arm slower! 2. Move the left arm slower! |
Slow | 1.You are moving very well! 2.You are moving very well! |
1. Move the left arm further up! 2. Move the left arm further up! |
Low ROM | 1.You are moving very well! 2.You are moving very well! |
1. Move the left arm further up! 2. Move the left arm further up! |
Wrong | 1. You are doing the exercise correctly, but you could do it better. 2. You are doing the exercise correctly, but you could do it better. |
1. You are doing the wrong exercise. 2. You are doing the wrong exercise. |
external-rotation-left
Y1M5 | Y1Q3 | |
---|---|---|
Correct | 1. You are doing the exercise correctly, but you could do it better. 2. You are doing the exercise correctly, but you could do it better. |
1. Move the left arm slower! 2.You are moving very well! |
Fast | 1.You are moving very well! 2.You are moving very well! |
1.Move the left arm slower! 2.Move the left arm slower! |
Slow | 1. You are doing the exercise correctly, but you could do it better. 2. You are moving very well! |
1. You are moving very well! 2. Move the left arm faster! |
Low ROM | 1.You are moving very well! 2. You are doing the exercise correctly, but you could do it better. |
1.You are moving very well! 2.Move the left arm slower! |
Wrong | 1. You are doing the exercise correctly, but you could do it better. 2. You are doing the exercise correctly, but you could do it better. |
1. You are doing the wrong exercise. 2. You are doing the wrong exercise. |
internal-rotation-left
Y1M5 | Y1Q3 | |
---|---|---|
Correct | 1.You are moving very well! 2.You are moving very well! |
1.You are moving very well! 2.You are moving very well! |
Fast | 1.You are moving very well! 2.You are moving very well! |
1.Move the left arm slower! 2.Move the left arm slower! |
Slow | 1.You are moving very well! 2.You are moving very well! |
1.You are moving very well! 2.Move the left arm faster! |
Low ROM | 1.You are not moving very well! 2.You are not moving very well! |
1. Move the left arm backwards! 2. Move the left arm backwards! |
Wrong | 1.You are not moving very well! 2.You are not moving very well! |
1. You are doing the wrong exercise. 2. You are doing the wrong exercise. |
It can be noticed that:
Outstanding report 🥇
We plan to produce a report describing the main improvements of Y1Q3 with respect to Y1M5. Y1M5 is tagged as v1.0.0 in master, therefore we can compare the output of the two demos in response to the same experiment.