jimthompson5802 / model-stacking-workbench

Framework for model stacking that can be applied to Kaggle competitions
MIT License
3 stars 1 forks source link

Saving k-fold models to fails in L0XTC1 model with error OSError: [Errno 22] Invalid argument #1

Closed jimthompson5802 closed 6 years ago

jimthompson5802 commented 6 years ago

For this model specification

#%%
#
# Set up model for training
#
this_model = ModelTrainer(
        ModelClass=ThisModel,  #Model algorithm
        model_params=dict(n_estimators=200,n_jobs=-1), #hyper-parameters
        model_id='L0XTC1',   # Model Identifier
        feature_set='KFS02'  # feature set to use
        )

Receive this error message in this_model.trainModel()

Model training starting for L0XTC1 with feature set KFS02 at 2018-06-09 07:00:51
test_prediction_method: k-fold_average_model
Starting model training: 2018-06-09 07:00:51
running fold: 1 at 2018-06-09 07:00:55
running fold: 2 at 2018-06-09 07:01:04
running fold: 3 at 2018-06-09 07:01:13
running fold: 4 at 2018-06-09 07:01:23
running fold: 5 at 2018-06-09 07:01:33
Traceback (most recent call last):

  File "<ipython-input-1-f7526bb80b35>", line 1, in <module>
    runfile('/Users/jim/Desktop/Kaggle/model-stacking-workbench/models/L0XTC1/train_model.py', wdir='/Users/jim/Desktop/Kaggle/model-stacking-workbench')

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "/Users/jim/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/jim/Desktop/Kaggle/model-stacking-workbench/models/L0XTC1/train_model.py", line 28, in <module>
    this_model.trainModel()

  File "/Users/jim/Desktop/Kaggle/model-stacking-workbench/framework/model_stacking.py", line 352, in trainModel
    pickle.dump(models_list,f)

OSError: [Errno 22] Invalid argument

Code fragment in question in model_stacking.py

            self.training_rows = train_df.shape[0]
            self.training_columns = len(predictors)
            with open(os.path.join(self.CONFIG['ROOT_DIR'],'models',
                                   self.model_id,
                                   self.model_id+'_model.pkl'),'wb') as f:
                pickle.dump(models_list,f)            

        self.training_time = time.time() - start_training
jimthompson5802 commented 6 years ago

Current work-around is to specify test_prediction_method='all_data_model' in creating the ModelTrainer object.

this_model = ModelTrainer(
        ModelClass=ThisModel,  #Model algorithm
        model_params=dict(n_estimators=200,n_jobs=-1), #hyper-parameters
        test_prediction_method='all_data_model',
        model_id='L0XTC1',   # Model Identifier
        feature_set='KFS02'  # feature set to use
        )
jimthompson5802 commented 6 years ago

Looks like this might be related to this Python issue running on MacOS. This seems to be the case because I can save models whose resulting size on disk is just under 2GB. However, if the model size seems to grow past 2GB, the pickle.dump() fails.

jimthompson5802 commented 6 years ago

Fix implemented based on this discussion.