Open svdhoog opened 6 years ago
Proposed change could be:
# comment: d was replaced by the line below to save memory
[*] # d = agent_dframes[param['agent']]
# check if table columns contain the given variables from config file
for i, entry in enumerate(var_list):
if not (entry in list(agent_dframes[param['agent']])):
erf("Table has columns {0} and var{1}='{2}' does not match.".format(list(agent_dframes[param['agent']]), i+1, entry))
# stage-I filtering, all input vars are sliced with desired set & run values
[**] filtered = agent_dframes[param['agent']].iloc[(d.index.get_level_values('set').isin(param['set'])) & (d.index.get_level_values('run').isin(param['run'])) & (d.index.get_level_values('major').isin(param['major'])) & (d.index.get_level_values('minor').isin(param['minor']))][var_list].dropna().astype(float)
df_main = pd.DataFrame()
index1 = 0
# stage-II filtering for selecting variables according to their values
for dkey, dval in var_dic.items():
df = filter_by_value(dkey, dval, filtered)
if df_main.empty:
df_main = df
else:
df_main = pd.concat([df_main, df], axis=1)
[***] del df
2nd case:
visualization/main.py,
line 161-163:
d = pd.DataFrame() # Main dataframe to hold all the dataframes of each instance (one agenttype)
df_list = []
... [constructing df_list]
[*] d = pd.concat(df_list) # Add each dataframe from panel into a main dataframe containing all sets and runs
[**] del df_list
[***] agent_dframes[agentname] = d # this dict contains agent-type names as keys, and the corresponding dataframes as values
[*]
Here df_list is concatenated/added to d
[**]
Then it is deleted
[***]
Now d
gets copied into agent_dframes[agentname]
Can [***] not be made more efficient ?
Proposed code change
[***] agent_dframes[agentname] = pd.concat(df_list) # like at [*] we concat df_list
Python does not create entire copies of the data frame in memory. Instead it creates a view in the variable d, and passes by reference here:
d = agent_dframes[param['agent']]
The only inefficiency here is that we are creating a new DataFrame df
containing the filtered data that then gets concatenated to df_main
:
for dkey, dval in var_dic.items():
df = filter_by_value(dkey, dval, filtered)
if df_main.empty:
df_main = df
else:
df_main = pd.concat([df_main, df], axis=1)
[***] del df
More efficient implementation
By removing the intermittent DataFrame df
for dkey, dval in var_dic.items():
if df_main.empty:
df_main = filter_by_value(dkey, dval, filtered)
else:
df_main = pd.concat([df_main, filter_by_value(dkey, dval, filtered)], axis=1)
visualization/main.py,
line 189-208:[*] line 189: This appears to make a copy of the entire data frame in memory in the variable d. Can this simply be resolved by copying the RHS of
d=
and using that in the lines below?[**] line 197: this appears to create another data frame
filtered
that is used in the lines below just once, in line 202.[***] Here df is deleted, which was the filtered data frame that was copied into df_main. Isn't this inefficient copying of data?