Closed rmitsch closed 6 years ago
Just save it as csv files? Maybe we could choose which data we want to save. e.g. only accelerometer sensor data, than we would not have the nested structure after preprocessing anymore
We could either dump it as .csv or be lazy and pickle it. Pro .csv: Human-readable. Pro pickle: We don't need to worry about the writing/reading process.
If you don't have a preference, I'll decide before I start implementing this issue.
Done in 6cc4a9c4fbb5b43369dee598e4727f5575d44d29. Usage:
# Preprocess data. Store result in /data/preprocessed/preprocessed_data.dat.
dfs = Preprocessor.preprocess([os.environ.get("KEY_RAPHAEL"),
os.environ.get("KEY_MORITZ"),
os.environ.get("KEY_LUKAS")],
filename="preprocessed_data.dat")
# Load dataframes from disk.
dfs = Preprocessor.restore_preprocessed_data_from_disk(filename="preprocessed_data.dat")
I just saw that the pickled result with paa have 900 MB. I would suggest to change this
preprocessed_data[token] = {}
preprocessed_data[token]["trips"] = dfs
preprocessed_data[token]["resampled_sensor_data"] = resampled_sensor_values
to this:
preprocessed_data[token]= resampled_sensor_values
otherwise we will save the same data twice. Another solution would be to use csv files instead.
Agree with preprocessed_data[token]= resampled_sensor_values
. Implemented in a75e36a.
I consider this issue to be resolved. If your opinion differs, please reopen it.
Includes functionality to restore data from disk (pickle?).