univie-datamining-team3 / assignment2

Analysis of mobility data
MIT License
0 stars 0 forks source link

Dump preprocessing results to file #13

Closed rmitsch closed 6 years ago

rmitsch commented 6 years ago

Includes functionality to restore data from disk (pickle?).

Lumik7 commented 6 years ago

Just save it as csv files? Maybe we could choose which data we want to save. e.g. only accelerometer sensor data, than we would not have the nested structure after preprocessing anymore

rmitsch commented 6 years ago

We could either dump it as .csv or be lazy and pickle it. Pro .csv: Human-readable. Pro pickle: We don't need to worry about the writing/reading process.

If you don't have a preference, I'll decide before I start implementing this issue.

rmitsch commented 6 years ago

Done in 6cc4a9c4fbb5b43369dee598e4727f5575d44d29. Usage:

# Preprocess data. Store result in /data/preprocessed/preprocessed_data.dat.
dfs = Preprocessor.preprocess([os.environ.get("KEY_RAPHAEL"),
                                                os.environ.get("KEY_MORITZ"),
                                                os.environ.get("KEY_LUKAS")],
                                               filename="preprocessed_data.dat")
# Load dataframes from disk.
dfs = Preprocessor.restore_preprocessed_data_from_disk(filename="preprocessed_data.dat")
Lumik7 commented 6 years ago

I just saw that the pickled result with paa have 900 MB. I would suggest to change this

preprocessed_data[token] = {}
preprocessed_data[token]["trips"] = dfs
preprocessed_data[token]["resampled_sensor_data"] = resampled_sensor_values

to this:

preprocessed_data[token]= resampled_sensor_values

otherwise we will save the same data twice. Another solution would be to use csv files instead.

rmitsch commented 6 years ago

Agree with preprocessed_data[token]= resampled_sensor_values. Implemented in a75e36a. I consider this issue to be resolved. If your opinion differs, please reopen it.