You made the following modification to join_history to handle cases where there are more samples than the sum of initial_data and max_size:
if run_condition['run_until_max_size']:
df_out = df_out.iloc[-(initial_size + run_condition['max_size']):]
However, this approach removes inliers that we want to retain in the final CSV file. To address this, you could consider using dropna on the target columns to keep only the inliers. This way, the total size of the combined CSV will correctly reflect (initial_data + max_size). But in any case, we could just keep all the samples and treat them later when computing metrics.
You made the following modification to
join_history
to handle cases where there are more samples than the sum ofinitial_data
andmax_size
:However, this approach removes inliers that we want to retain in the final CSV file. To address this, you could consider using
dropna
on the target columns to keep only the inliers. This way, the total size of the combined CSV will correctly reflect(initial_data + max_size)
. But in any case, we could just keep all the samples and treat them later when computing metrics.