Open petteriTeikari opened 5 years ago
See for example the following example:
in which there are clear glitches in signal when confidence state is 1
, especially for x and z coordinates You could want to flag these as "outlier" and impute the missing values (e.g. missForest
, cited by 785 articles, or something more powerful deep learning-based one, e.g. BRITS
for multivariate time-series
And this signal quality issue is of course reflected in the orientation data
Especially your 4th absolute quaternion seems very noisy, which is the W quaternion: "Kinect is reading the joint orientation values as a quaternion. A quaternion is a set of 4 values: X, Y, Z, and W. The Kinect SDK is encapsulating the quaternion into a structure called Vector4. We need to transform this quaternion (Vector4) into a set of 3 numeric values."
Now per each joint you have 7 "channels" (3 xyz pose, 4 quaternions) and if you want to treat every joint as independent, then you could for example this 7-dimensional timeseries to BRITS
and remove all the samples when estimation
was ON? A reasonable first guess approximation, but later of course you want to consider all joints "jointly" as they move together ("conditioned by each other).
It is reasonable to assume that a more powerful post-processing algorithm would produce nicer signals compared to these obtained from Kinect SDK.
Somewhat reasonable would be the assumption in EEG/MEG/fMRI sense that some artifacts have coupled only to some spatial channels, and it could be removed by ICA (independent component analysis). For example the image contrast on the left side would be very bad giving bad RGB signal quality leading to poor estimates of the left side of the skeleton?
And if you pool a lot of Kinect v2 data, you probably can have an unsupervised model of "most probable" human dynamics (e.g. those glitches might be too fast just for any sort of ninja movement )
You could for example explore the nonlinear ICA / variational encoder path?
Variational Autoencoders and Nonlinear ICA: A Unifying Framework Ilyes Khemakhem, Diederik P. Kingma, Aapo Hyvärinen Gatsby Computational Neuroscience Unit, UCL / Google Brain / INRIA-Saclay Dept of CS, University of Helsinki. (Submitted on 10 Jul 2019) https://arxiv.org/abs/1907.04809
Denoising is also desirable for the raw signal as you see this ripple e.g. on orientation data
which the authors reduced with a low-pass filter filtering.m
You can plot the frequency response with freqz(b,a)
which gives you the following
For an example joint position signal
Simple low-pass filtering gets rid of those glitches around the estimated joints
The filter parameters chosen by the authors cannot be used for the joint orientation data, as illustrated below:
The smooth sinusoidal joint orientation time series is distorted by the glitches and when low-pass filtered the "half-wave" of the sine wave is "split" into two peaks which intuitively does not seem very physiological
You can see that 'residual glitches' are quite abrupt jumps in signal
And as a simple solution, we can try to get rid of those with changepoint detection (see e.g. http://members.cbio.mines-paristech.fr/~thocking/change-tutorial/RK-CptWorkshop.html Introduction to optimal changepoint detection algorithms by Rebecca Killick (r.killick@lancs.ac.uk)
And the same in Matlab (you require the Signal Processing Toolbox, findchangepts
, https://uk.mathworks.com/help/signal/ref/findchangepts.html]
findchangepts(samples_raw(:,1),'Statistic','linear','MinThreshold',0.5)
findchangepts(samples_raw(:,1),'Statistic','std','MinThreshold',25)
findchangepts(samples_raw(:,1),'Statistic','rms','MinThreshold',12)
with manually defined thresholds for each of the 4 joint signals per each exercise, per each subject not the most feasible approach
The effect of marking estimated joint timestamps as NaNs, illustrated as video
Upper row contains RAW values (left, position, right, orientation), and the bottom row contains the timeseries with estimated (non-tracked) joint positions and orientations dropped. https://youtu.be/HVF_G9zguJc
Now we have an irregularly sampled timeseries, and we need to impute the missing value, or optimally have a gold-standard measurements done at some time, so that one condition the imputation GAN (or whatever you plan to use) with the "real human motion dynamics" from the gold standard. As one can see that there are some clear non-physiological glitches there
The data comes unfiltered (
raw
) forestimated
positions (the purpleconfidenceState
signal that has1
if the joint is estimated, and2
if the joints is tracked) which you can further filter if you want (for example treat them as outliers and use GAN to impute the "missing values")'x axis in seconds, cameraXYZ in meters
See more in Kinect documentation:
Kinect sdk 2.0