univie-datamining-team3 / assignment2

Analysis of mobility data
MIT License
0 stars 0 forks source link

Preprocessing: Remove None type pandas.DataFrames during preprocessing #11

Closed Lumik7 closed 6 years ago

Lumik7 commented 6 years ago

It seems some dataframes are set to None in case the recording did not work e.g. for token "KEY_LUKAS" trip number 17, table "location". I encountered this error when trying to call Preprocessor.convert_timestamps(df). I fixed this specific error already in this commit, but I think it would be better to replace None type dataframes during preprocessing() with an empty DataFrame.

rmitsch commented 6 years ago

@Lumik7 Are you on this already? Otherwise I'd do it tomorrow.

Lumik7 commented 6 years ago

@rmitsch No I did nothing except for the fix mentioned above. It would be great if you systematically check the code for none values. Either by introducing None checks or replacement of pd.DataFrames. The goal should be that we do not have to worry about None pd.DataFrames after the preprocessing step.

rmitsch commented 6 years ago

@Lumik7 Alright, will do. I'll handle #12 as well if that's ok for you, since it's pretty much the same problem.

Lumik7 commented 6 years ago

@rmitsch yes, that's fine, but I don't think that is an issue anymore because we do not use it in the preprocessing anymore --> it got replaced by paa

rmitsch commented 6 years ago

@Lumik7 That's true. I suggest closing #12 as wont-fix then.

Lumik7 commented 6 years ago

yes, go ahead

rmitsch commented 6 years ago

Added replace_none_values_with_empty_dataframes(dataframe_dicts: list) in 326664c. Applied it after every preprocessing step. E. g.:

# 2. Remove trips less than 10 minutes long.
dfs = Preprocessor.replace_none_values_with_empty_dataframes(
    Preprocessor._remove_dataframes_by_duration_limit(dfs, 10 * 60)
)

Please check whether the results conforms to your expectations. If so, I'll merge to master and close the issue.

Lumik7 commented 6 years ago

I think this:

{
    key: pd.DataFrame() if df_dict[key] is None else df_dict[key]
    for key in df_dict
 } for df_dict in dataframe_dicts

code snippet will introduce some problems, because empty data frames will introduce key errors as there are no column names. To be safe the correct column names should be added for the empty DataFrames.

rmitsch commented 6 years ago

Might be the case. Will do.

rmitsch commented 6 years ago

Replaced with

{
    key: pd.DataFrame(columns=Preprocessor.DATAFRAME_COLUMN_NAMES[key])
    if df_dict[key] is None else df_dict[key]
    for key in df_dict
} for df_dict in dataframe_dicts

where

DATAFRAME_COLUMN_NAMES = {
    "cell": ['time', 'cid', 'lac', 'asu'],
    "annotation": ['time', 'mode', 'notes'],
    "location": ['time', 'gpstime', 'provider', 'longitude', 'latitude', 'altitude', 'speed', 'bearing',
                 'accuracy'],
    "sensor": ['sensor', 'time', 'x', 'y', 'z', 'total'],
    "mac": ['time', 'ssid', 'level'],
    "marker": ['time', 'marker'],
    "event": ['time', 'event', 'state']
}

I can't (re-)produce an error, so I'm not sure whether this will solve the problem, but it ought too. If you agree, I'll merge into master.

Lumik7 commented 6 years ago

Yes, I agree

Am 18.12.2017 1:27 nachm. schrieb "Raphael Mitsch" <notifications@github.com

:

Replaced with

{ key: pd.DataFrame(columns=Preprocessor.DATAFRAME_COLUMN_NAMES[key]) if df_dict[key] is None else df_dict[key] for key in df_dict } for df_dict in dataframe_dicts

where

DATAFRAME_COLUMN_NAMES = { "cell": ['time', 'cid', 'lac', 'asu'], "annotation": ['time', 'mode', 'notes'], "location": ['time', 'gpstime', 'provider', 'longitude', 'latitude', 'altitude', 'speed', 'bearing', 'accuracy'], "sensor": ['sensor', 'time', 'x', 'y', 'z', 'total'], "mac": ['time', 'ssid', 'level'], "marker": ['time', 'marker'], "event": ['time', 'event', 'state'] }

I can't (re-)produce an error, so I'm not sure whether this will solve the problem, but it ought too. If you agree, I'll merge into master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/univie-datamining-team3/assignment2/issues/11#issuecomment-352412111, or mute the thread https://github.com/notifications/unsubscribe-auth/AOiGfVPpodEXQm0AT0TDNg5tG3tYfcW8ks5tBlorgaJpZM4Q_bC_ .

rmitsch commented 6 years ago

Merged.