A while back we stated writing cooked logs to 250 mb files instead of one large file per day.
Before this change we were running evaluations with gzip_mode = 1 (uses largest file per day per configuration folder)
After splitting up the cooked logs into 250 mb files, this gzip_mode caused us to skip a large portion of data for each day so we switched to gzip_mode = 0 (turns out this only runs on data from the oldest config folder)
With Auto-optimization, every time the auto-optimization pipeline runs, a new config file will be created and the model will be carried over to the new config folder, so we will not be retraining on 2 days worth of data but we will have files with the same name in different config folders.
I added a gzip_mode = 3 which runs evaluations on non duplicate data from all files in all configuration folders.
Background:
With Auto-optimization, every time the auto-optimization pipeline runs, a new config file will be created and the model will be carried over to the new config folder, so we will not be retraining on 2 days worth of data but we will have files with the same name in different config folders.
I added a gzip_mode = 3 which runs evaluations on non duplicate data from all files in all configuration folders.