Closed rmitsch closed 6 years ago
See Preprocessor._filter_nan_values(...)
in d44a6a5064c7cdb2c340517e9df4df8a42a80232 for removal of NaN values/dataframes with large percentage of NaN values, if necessary.
The pandas.resample()
function is introducing the NaN values. It only produces those for the magnetic and acceleration sensor of two trips in @rmitsch trips. I couldn't figure out if it is a problem with the trips (maybe some floating point error?) or if it is a bug in the pandas.resample()
function. I would recommend that the NaN values are dropped after resampling, because only very few records have this problem
The code below reproduces the error when only the pandas.resample()
is used:
token = os.environ.get("KEY_RAPHAEL")
dfs = Preprocessor.preprocess([token])
dfs[token]["trips"] = Preprocessor.convert_timestamps(dfs[token]["trips"])
for i in range(len(dfs[os.environ.get("KEY_RAPHAEL")]["trips"])):
print("trip: ", i)
cd = dfs[os.environ.get("KEY_RAPHAEL")]["trips"][i]
before_sampling = cd["sensor"].isnull().sum().sum()
print("Total NaNs: ", before_sampling)
accel = cd["sensor"]
accel = accel[accel["sensor"]=="acceleration"]
accel_resampled = accel.set_index("time").resample("S").mean()
after_sampling = accel_resampled.isnull().sum().sum()
print("Total NaNs: ", after_sampling)
Update:
There is a lag in the recording of the data. pandas.resample is working as expected, it just filled the missing values with NaNs. See recording below, where left side is the resampled table and on the right side the original one:
How about we replace those NaNs with either the last valid value or interpolate between last valid value before and first one after?
Interpolation should be fine, but we should consider large lags maybe > 10 secs as invalid trips. On another thought if we really have to use only 30 sec cuts for the clustering, we could just skip the rows when cutting. On a side note, this issue is also related to issue #11.
Closed due to Preprocessor.downsample_time_series_category()
being deprecated in favour of PAA.
To reproduce: