mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.01k stars 401 forks source link

Problems loading models #622

Closed Selphie14100 closed 12 months ago

Selphie14100 commented 1 year ago

OK. I have trained models and I have them sitting in this directory.

C:\Users\User\GBAW23

When I try to run this code.

automl = AutoML(results_path=r"C:\Users\User\GBAW23")

I get this error

What am I doing wrong?

2023-05-19 20:24:33,807 supervised.exceptions ERROR Cannot load AutoML directory. Expecting value: line 1 column 1 (char 0)


JSONDecodeError Traceback (most recent call last) File ~\anaconda3\lib\site-packages\supervised\base_automl.py:211, in BaseAutoML.load(self, path) 210 else: --> 211 m = ModelFramework.load(path, model_subpath, lazy_load) 212 self._models += [m]

File ~\anaconda3\lib\site-packages\supervised\model_framework.py:570, in ModelFramework.load(results_path, model_subpath, lazy_load) 568 logger.info(f"Loading model framework from {model_path}") --> 570 json_desc = json.load(open(os.path.join(model_path, "framework.json"))) 571 mf = ModelFramework(json_desc["params"])

File ~\anaconda3\lib\json__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, kw) 276 """Deserialize fp (a .read()-supporting file-like object containing 277 a JSON document) to a Python object. 278 (...) 291 kwarg; otherwise JSONDecoder is used. 292 """ --> 293 return loads(fp.read(), 294 cls=cls, object_hook=object_hook, 295 parse_float=parse_float, parse_int=parse_int, 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, kw)

File ~\anaconda3\lib\json__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 343 if (cls is None and object_hook is None and 344 parse_int is None and parse_float is None and 345 parse_constant is None and object_pairs_hook is None and not kw): --> 346 return _default_decoder.decode(s) 347 if cls is None:

File ~\anaconda3\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w) 333 """Return the Python representation of s (a str instance 334 containing a JSON document). 335 336 """ --> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 338 end = _w(s, end).end()

File ~\anaconda3\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx) 354 except StopIteration as err: --> 355 raise JSONDecodeError("Expecting value", s, err.value) from None 356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

AutoMLException Traceback (most recent call last) Cell In[13], line 2 1 automl = AutoML(results_path=r"C:\Users\User\GBAW23") ----> 2 automl.predict(X)

File ~\anaconda3\lib\site-packages\supervised\automl.py:387, in AutoML.predict(self, X) 370 def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray: 371 """ 372 Computes predictions from AutoML best model. 373 (...) 385 AutoMLException: Model has not yet been fitted. 386 """ --> 387 return self._predict(X)

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:1369, in BaseAutoML._predict(self, X) 1367 def _predict(self, X): -> 1369 predictions = self._base_predict(X) 1370 # Return predictions 1371 # If classification task the result is in column 'label' 1372 # If regression task the result is in column 'prediction' 1373 return ( 1374 predictions["label"].to_numpy() 1375 if self._ml_task != REGRESSION 1376 else predictions["prediction"].to_numpy() 1377 )

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:1301, in BaseAutoML._base_predict(self, X, model) 1299 if model is None: 1300 if self._best_model is None: -> 1301 self.load(self.results_path) 1302 model = self._best_model 1304 if model is None:

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:234, in BaseAutoML.load(self, path) 231 self.n_classes = self._data_info["n_classes"] 233 except Exception as e: --> 234 raise AutoMLException(f"Cannot load AutoML directory. {str(e)}")

AutoMLException: Cannot load AutoML directory. Expecting value: line 1 column 1 (char 0)

pplonski commented 1 year ago

Thank you @Selphie14100 for reporting the issue. Please provide data and code to reproduce the problem. Thanks!

Selphie14100 commented 1 year ago

Good news and bad news. I have sorted out what the problem is. If you, as I did, fail to set result_path explicitly when you train then the models are stored in a folder AUTO_ML_XX. If you call them from this folder or from a folder explicitly named in training in result_path the model loads.

However, if you rename the folder AUTO_ML_XX after training to something else the model will not load.

Two points arise from this....

  1. Perhaps this should be explicitly flagged in the documentation as the models are tied to the folder name they were originally created in
  2. Is there a way to alter the files in a model directory so it runs from the new directory name.? Otherwise I have a week's training to redo :-(
pplonski commented 1 year ago

Hi, what version are you using, because Im pretty sure that I was fixing similar issue and you should be able to rename results dir.

Selphie14100 commented 1 year ago

(0.11.5)

Selphie14100 commented 1 year ago

OK bad news just tried the same as prev with slightly different parameters and it fails again?. Here is the python code ,the errors and the json parameters file,

My python code

automl = AutoML(results_path=r"C:\Users\User\IRETURFSP")

Error message

023-05-20 12:02:36,511 supervised.exceptions ERROR Cannot load AutoML directory. Expecting value: line 1 column 1 (char 0)

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\supervised\base_automl.py:211, in BaseAutoML.load(self, path)
    210 else:
--> 211     m = ModelFramework.load(path, model_subpath, lazy_load)
    212     self._models += [m]

File ~\anaconda3\lib\site-packages\supervised\model_framework.py:570, in ModelFramework.load(results_path, model_subpath, lazy_load)
    568 logger.info(f"Loading model framework from {model_path}")
--> 570 json_desc = json.load(open(os.path.join(model_path, "framework.json")))
    571 mf = ModelFramework(json_desc["params"])

File ~\anaconda3\lib\json\__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    276 """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
    277 a JSON document) to a Python object.
    278 
   (...)
    291 kwarg; otherwise ``JSONDecoder`` is used.
    292 """
--> 293 return loads(fp.read(),
    294     cls=cls, object_hook=object_hook,
    295     parse_float=parse_float, parse_int=parse_int,
    296     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

File ~\anaconda3\lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~\anaconda3\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~\anaconda3\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

AutoMLException                           Traceback (most recent call last)
Cell In[11], line 2
      1 automl = AutoML(results_path="IRETURFSP")
----> 2 automl.predict(X)

File ~\anaconda3\lib\site-packages\supervised\automl.py:387, in AutoML.predict(self, X)
    370 def predict(self, X: Union[List, numpy.ndarray, pandas.DataFrame]) -> numpy.ndarray:
    371     """
    372     Computes predictions from AutoML best model.
    373 
   (...)
    385         AutoMLException: Model has not yet been fitted.
    386     """
--> 387     return self._predict(X)

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:1369, in BaseAutoML._predict(self, X)
   1367 def _predict(self, X):
-> 1369     predictions = self._base_predict(X)
   1370     # Return predictions
   1371     # If classification task the result is in column 'label'
   1372     # If regression task the result is in column 'prediction'
   1373     return (
   1374         predictions["label"].to_numpy()
   1375         if self._ml_task != REGRESSION
   1376         else predictions["prediction"].to_numpy()
   1377     )

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:1301, in BaseAutoML._base_predict(self, X, model)
   1299 if model is None:
   1300     if self._best_model is None:
-> 1301         self.load(self.results_path)
   1302     model = self._best_model
   1304 if model is None:

File ~\anaconda3\lib\site-packages\supervised\base_automl.py:234, in BaseAutoML.load(self, path)
    231         self.n_classes = self._data_info["n_classes"]
    233 except Exception as e:
--> 234     raise AutoMLException(f"Cannot load AutoML directory. {str(e)}")

AutoMLException: Cannot load AutoML directory. Expecting value: line 1 column 1 (char 0)

1

JSON parameters file from directory

{
    "mode": "Compete",
    "ml_task": "regression",
    "results_path": "IRETURFSP",
    "total_time_limit": 3600,
    "model_time_limit": null,
    "algorithms": [
        "Xgboost",
        "Random Forest",
        "Baseline",
        "Extra Trees",
        "CatBoost",
        "Linear",
        "Neural Network"
    ],
    "train_ensemble": true,
    "stack_models": true,
    "eval_metric": "rmse",
    "validation_strategy": {
        "validation_type": "kfold",
        "k_folds": 5,
        "shuffle": true,
        "random_seed": 123,
        "X_path": "IRETURFSP\\X.data",
        "y_path": "IRETURFSP\\y.data",
        "results_path": "IRETURFSP"
    },
    "verbose": 1,
    "explain_level": 2,
    "golden_features": true,
    "features_selection": true,
    "start_random_models": 20,
    "hill_climbing_steps": 2,
    "top_models_to_improve": 3,
    "boost_on_errors": true,
    "kmeans_features": true,
    "mix_encoding": true,
    "max_single_prediction_time": null,
    "n_jobs": -1,
    "random_state": 1234,
    "saved": [
        "1_Baseline",
        "2_Linear",
        "3_Default_Xgboost",
        "4_Default_CatBoost",
        "5_Default_NeuralNetwork",
        "6_Default_RandomForest",
        "7_Default_ExtraTrees",
        "8_Xgboost",
        "27_CatBoost",
        "46_RandomForest",
        "65_ExtraTrees",
        "84_NeuralNetwork",
        "9_Xgboost",
        "28_CatBoost",
        "47_RandomForest",
        "66_ExtraTrees",
        "85_NeuralNetwork",
        "10_Xgboost",
        "29_CatBoost",
        "48_RandomForest",
        "67_ExtraTrees",
        "86_NeuralNetwork",
        "11_Xgboost",
        "30_CatBoost",
        "49_RandomForest",
        "68_ExtraTrees",
        "87_NeuralNetwork",
        "12_Xgboost",
        "31_CatBoost",
        "50_RandomForest",
        "69_ExtraTrees",
        "88_NeuralNetwork",
        "13_Xgboost",
        "32_CatBoost",
        "51_RandomForest",
        "70_ExtraTrees",
        "89_NeuralNetwork",
        "14_Xgboost",
        "33_CatBoost",
        "52_RandomForest",
        "71_ExtraTrees",
        "90_NeuralNetwork",
        "15_Xgboost",
        "3_Default_Xgboost_categorical_mix",
        "3_Default_Xgboost_categorical_mix_GoldenFeatures",
        "3_Default_Xgboost_GoldenFeatures",
        "8_Xgboost_GoldenFeatures",
        "3_Default_Xgboost_categorical_mix_KMeansFeatures",
        "3_Default_Xgboost_KMeansFeatures",
        "3_Default_Xgboost_categorical_mix_RandomFeature",
        "91_Xgboost",
        "92_Xgboost",
        "93_Xgboost",
        "94_RandomForest",
        "95_RandomForest",
        "96_NeuralNetwork",
        "97_CatBoost",
        "98_ExtraTrees",
        "99_ExtraTrees",
        "100_Xgboost",
        "101_Xgboost",
        "102_Xgboost",
        "103_Xgboost",
        "91_Xgboost_BoostOnErrors",
        "Ensemble",
        "91_Xgboost_Stacked",
        "51_RandomForest_Stacked",
        "70_ExtraTrees_Stacked",
        "90_NeuralNetwork_Stacked",
        "29_CatBoost_Stacked",
        "101_Xgboost_Stacked",
        "50_RandomForest_Stacked",
        "Ensemble_Stacked"
    ],
    "fit_level": "finished",
    "best_model": "Ensemble_Stacked",
    "load_on_predict": [
        "100_Xgboost",
        "101_Xgboost",
        "103_Xgboost",
        "11_Xgboost",
        "1_Baseline",
        "27_CatBoost",
        "28_CatBoost",
        "29_CatBoost",
        "2_Linear",
        "30_CatBoost",
        "31_CatBoost",
        "32_CatBoost",
        "33_CatBoost",
        "3_Default_Xgboost",
        "3_Default_Xgboost_KMeansFeatures",
        "3_Default_Xgboost_categorical_mix",
        "3_Default_Xgboost_categorical_mix_KMeansFeatures",
        "46_RandomForest",
        "47_RandomForest",
        "48_RandomForest",
        "49_RandomForest",
        "4_Default_CatBoost",
        "50_RandomForest",
        "51_RandomForest",
        "51_RandomForest_Stacked",
        "52_RandomForest",
        "5_Default_NeuralNetwork",
        "65_ExtraTrees",
        "66_ExtraTrees",
        "67_ExtraTrees",
        "68_ExtraTrees",
        "69_ExtraTrees",
        "6_Default_RandomForest",
        "70_ExtraTrees",
        "70_ExtraTrees_Stacked",
        "71_ExtraTrees",
        "7_Default_ExtraTrees",
        "84_NeuralNetwork",
        "85_NeuralNetwork",
        "86_NeuralNetwork",
        "87_NeuralNetwork",
        "88_NeuralNetwork",
        "89_NeuralNetwork",
        "8_Xgboost",
        "8_Xgboost_GoldenFeatures",
        "90_NeuralNetwork",
        "91_Xgboost",
        "91_Xgboost_BoostOnErrors",
        "91_Xgboost_Stacked",
        "93_Xgboost",
        "94_RandomForest",
        "95_RandomForest",
        "96_NeuralNetwork",
        "97_CatBoost",
        "98_ExtraTrees",
        "99_ExtraTrees",
        "Ensemble",
        "Ensemble_Stacked"
    ],
    "stacked": [
        "Ensemble",
        "91_Xgboost_BoostOnErrors",
        "91_Xgboost",
        "101_Xgboost",
        "3_Default_Xgboost_categorical_mix",
        "3_Default_Xgboost_categorical_mix_KMeansFeatures",
        "93_Xgboost",
        "3_Default_Xgboost",
        "103_Xgboost",
        "100_Xgboost",
        "3_Default_Xgboost_KMeansFeatures",
        "51_RandomForest",
        "50_RandomForest",
        "70_ExtraTrees",
        "48_RandomForest",
        "2_Linear",
        "49_RandomForest",
        "95_RandomForest",
        "90_NeuralNetwork",
        "94_RandomForest",
        "29_CatBoost",
        "69_ExtraTrees",
        "67_ExtraTrees",
        "4_Default_CatBoost",
        "27_CatBoost",
        "6_Default_RandomForest",
        "98_ExtraTrees",
        "88_NeuralNetwork",
        "52_RandomForest",
        "85_NeuralNetwork",
        "99_ExtraTrees",
        "68_ExtraTrees",
        "96_NeuralNetwork",
        "46_RandomForest",
        "5_Default_NeuralNetwork",
        "47_RandomForest",
        "7_Default_ExtraTrees",
        "71_ExtraTrees",
        "86_NeuralNetwork",
        "87_NeuralNetwork",
        "84_NeuralNetwork",
        "66_ExtraTrees",
        "65_ExtraTrees",
        "30_CatBoost",
        "31_CatBoost",
        "97_CatBoost",
        "28_CatBoost",
        "33_CatBoost",
        "32_CatBoost",
        "89_NeuralNetwork"
    ]
}
Selphie14100 commented 1 year ago

If you want the outputs from the training in the folder this is tricky as zipped its 148MB and we have a 25MB limit here.

Selphie14100 commented 1 year ago

I have a 5 minute trained compete model here.

This does load with this line!!! But bigger ones don't???

automl = AutoML(results_path=r"C:\Users\User\TEST")

Trained as below

X_train, X_test, y_train, y_test = train_test_split(
    train2,
    y,
    test_size=0.15,
    random_state=123,
)

automl = AutoML(results_path='TEST',validation_strategy={
    "validation_type": "kfold",
    "k_folds": 5,
    "shuffle": True,
    "stratify": True,
    "random_seed": 123
},stack_models=True,mode="Compete",total_time_limit=5*60,explain_level=2,eval_metric="rmse",start_random_models=20,algorithms=[ 'Xgboost','Random Forest', 'Baseline', 'Extra Trees','CatBoost', 'Linear', 'Neural Network'])
automl.fit(X_train, y_train)

TEST.zip

pplonski commented 1 year ago

Hi @Selphie14100,

I'm back in the office. I've checked again your logs and it looks to me that one of your models might be missing framework.json file. That's might be the source of problems. You can try to change to logging level to DEBUG and try to check which one is missing the framework.json.

Here is example code to change logging level: https://github.com/mljar/mljar-supervised/blob/35584462ed0fc6e7345f4999b1019c0990598c07/supervised/automl.py#L15-L19

Selphie14100 commented 1 year ago

Thank you for the code. I am not sure what I am supposed to do with it 😀 I’ll have a look manually though for framework.json

Sent from my iPhone

On 22 May 2023, at 07:58, Piotr @.***> wrote:



Hi @Selphie14100https://github.com/Selphie14100,

I'm back in the office. I've checked again your logs and it looks to me that one of your models might be missing framework.json file. That's might be the source of problems. You can try to change to logging level to DEBUG and try to check which one is missing the framework.json.

Here is example code to change logging level: https://github.com/mljar/mljar-supervised/blob/35584462ed0fc6e7345f4999b1019c0990598c07/supervised/automl.py#L15-L19

— Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/622#issuecomment-1556644196, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUJCGYPZJOCCYKWDBWGKQQLXHMFBZANCNFSM6AAAAAAYIFAJEA. You are receiving this because you were mentioned.Message ID: @.***>

Selphie14100 commented 1 year ago

OK used python yo to find subdirectories missing framework.json and indeed it is missing in the ensemble directory which is the best model.

Selphie14100 commented 1 year ago

Apologies these are the directories missing framework.json C:\Users\User\IRETURF12\Ensemble C:\Users\User\IRETURF12\Ensemble_Stacked Ensemble stacked is the best model Ensemble_Stacked.zip

Selphie14100 commented 1 year ago

Is there any update on this?

pplonski commented 1 year ago

Hi @Selphie14100,

Apologize for no response. I don't have time to work on this issue right now. I can't give time estimate when I can look into it.

We have commercial customers for our open-source tools and they have priority in support.

Selphie14100 commented 1 year ago

Ok. I appreciate that!

Selphie14100 commented 1 year ago

I'll keep this open, so I've found what has been going on. Every now and then MLJAR produces a framework JSON for a model that has no text in it. This isn't a problem unless it turns up in the final model. Here is an error example

Error for 10_Xgboost_categorical_mix

Object of type int64 is not JSON serializable Traceback (most recent call last): File "C:\Users\User\anaconda3\lib\site-packages\supervised\base_automl.py", line 1095, in _fit trained = self.train_model(params) File "C:\Users\User\anaconda3\lib\site-packages\supervised\base_automl.py", line 386, in train_model mf.save(results_path, model_subpath) File "C:\Users\User\anaconda3\lib\site-packages\supervised\model_framework.py", line 512, in save fout.write(json.dumps(desc, indent=4)) File "C:\Users\User\anaconda3\lib\json__init.py", line 238, in dumps **kw).encode(obj) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 201, in encode chunks = list(chunks) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 438, in _iterencode o = _default(o) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name__} ' TypeError: Object of type int64 is not JSON serializable

pplonski commented 1 year ago

Thank you @Selphie14100, maybe you found the reason why files are empty. There must be some int64 object in model description which crashes the model save.

Selphie14100 commented 1 year ago

I will dig a bit deeper as I go through. Its the intermittency that’s the tricky bit.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: @.> Sent: 31 May 2023 14:52 To: @.> Cc: Steve @.>; @.> Subject: Re: [mljar/mljar-supervised] Problems loading models (Issue #622)

Thank you @Selphie14100https://github.com/Selphie14100, maybe you found the reason why files are empty. There must be some int64 object in model description which crashes the model save.

— Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/622#issuecomment-1570281805, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUJCGYITTCUCXSYZB7W4KQLXI5ELHANCNFSM6AAAAAAYIFAJEA. You are receiving this because you were mentioned.Message ID: @.***>

Selphie14100 commented 1 year ago

So the problem is in the XGBOOST model. Applies also to derivatives of XGBOOST model. Workaround. find the model(s) in the errors section . Delete any of those from params.json and ensemble.JSON. It then runs, obvs accuracy may drop :-) . Quick fix would be to preclude any models that get into the errors file from being used in the ensemble/stack models. Stack models is worse hit as it uses more models so is more likely to use a damaged one, NB some XGBOOST models are fine so its an intermittent little beast.

pplonski commented 1 year ago

@Selphie14100 you are truly bug detective! Thank you for investigation!

pplonski commented 1 year ago

Hi @Selphie14100,

I think I found the reason. In this line https://github.com/mljar/mljar-supervised/blob/35584462ed0fc6e7345f4999b1019c0990598c07/supervised/preprocessing/scale.py#L29

It should be

self.X_min_values = np.min(X[self.columns], axis=0)

You can try to change this code locally and check if it is working.

Selphie14100 commented 1 year ago

OK I might be able to find bugs but I cant find this file 😊

Can you tell me where it is?

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: @.> Sent: 07 June 2023 11:43 To: @.> Cc: Steve @.>; @.> Subject: Re: [mljar/mljar-supervised] Problems loading models (Issue #622)

Hi @Selphie14100https://github.com/Selphie14100,

I think I found the reason. In this line https://github.com/mljar/mljar-supervised/blob/35584462ed0fc6e7345f4999b1019c0990598c07/supervised/preprocessing/scale.py#L29

It should be

self.X_min_values = np.min(X[self.columns], axis=0)

You can try to change this code locally and check if it is working.

— Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/622#issuecomment-1580490099, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUJCGYNJKP5XTF7X3RXHLO3XKBLOJANCNFSM6AAAAAAYIFAJEA. You are receiving this because you were mentioned.Message ID: @.***>

Selphie14100 commented 1 year ago

found it. Testing now.

Selphie14100 commented 1 year ago

sadly :-(

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

Error for 13_Xgboost_categorical_mix

Object of type int64 is not JSON serializable Traceback (most recent call last): File "C:\Users\User\anaconda3\lib\site-packages\supervised\base_automl.py", line 1095, in _fit trained = self.train_model(params) File "C:\Users\User\anaconda3\lib\site-packages\supervised\base_automl.py", line 386, in train_model mf.save(results_path, model_subpath) File "C:\Users\User\anaconda3\lib\site-packages\supervised\model_framework.py", line 512, in save fout.write(json.dumps(desc, indent=4)) File "C:\Users\User\anaconda3\lib\json__init.py", line 238, in dumps **kw).encode(obj) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 201, in encode chunks = list(chunks) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 405, in _iterencode_dict yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 325, in _iterencode_list yield from chunks File "C:\Users\User\anaconda3\lib\json\encoder.py", line 438, in _iterencode o = _default(o) File "C:\Users\User\anaconda3\lib\json\encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name__} ' TypeError: Object of type int64 is not JSON serializable

pplonski commented 1 year ago

Oh, sorry. Is it possible to share data sample and code to reproduce the issue locally?

Selphie14100 commented 1 year ago

I’ll sort out a small data set and fire it over with errors

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: @.> Sent: 12 June 2023 12:03 To: @.> Cc: Steve @.>; @.> Subject: Re: [mljar/mljar-supervised] Problems loading models (Issue #622)

Oh, sorry. Is it possible to share data sample and code to reproduce the issue locally?

— Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/622#issuecomment-1587098297, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUJCGYK3MRO76TUZ45QPZHTXK3ZN7ANCNFSM6AAAAAAYIFAJEA. You are receiving this because you were mentioned.Message ID: @.***>

Selphie14100 commented 1 year ago

Good news! cracked it. When I was doing feature selection before sending the data to MLJAR I converted my category fields into categories using Pandas. If I just leave them as Ints everything works fine!

The accuracy of the models seems unimpaired.,

pplonski commented 1 year ago

Woohoo! So there must be the issue with converting categoricals to numeric

KarthikDutt commented 1 year ago

Faced the same error and could get around it by implementing the solution as described here https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable/57915246#57915246

pplonski commented 12 months ago

It should be fixed with #496 fix.