mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

Can you move the model directory? #214

Closed tmontana closed 3 years ago

tmontana commented 3 years ago

Hi. It seems that moving a directory that contains a model breaks all the dependencies within it. Is there a way to 'save_as' the model to a new place without editing each of the .json files?

Thanks,

pplonski commented 3 years ago

I need to check what will be the best option to move AutoML models. This can be a problem during deploying.

It's a duplicate of #120

tmontana commented 3 years ago

oh woaw - I hadn't thought about production yet. Was just trying to clean up my directory structure. That will indeed become a huge problem. Is there a way to implement a 'saveas' method? That would copy the object to somewhere else with proper file dependencies?

tmontana commented 3 years ago

FYI - I manually edited the json files and changed the directories. This is clearly not a scaleable hack but it seems to work.

tmontana commented 3 years ago

Please see below temporary solution - in case it can help someone. Basically this will recurse through the model directory and all the .json files, find all instances of the old path and change them to the new path which should then allow the model to load properly in the new location.

Usage:

to be safe make a copy of your model in a new location as opposed to moving it

(1) copy model directory to new location (2) set model_dir_name = # example: model_dir_name='my_xgb_model' (3) set look_for_value = # example: look_for_value='./../models/1A_XGB/' (4) set change_to_value = # example: look_for_value='./../models/'

run through the script...

import json
from pathlib import Path

model_dir_name='tmp_old_del_ensemble_XGBoost_SIMPLE_t0/'
look_for_value='./../models/1A_XGB/'
replace_with_value='./../models/'

def item_generator(json_input, lookup_val, parent_key=None, element_type='Dict'):
    if isinstance(json_input, dict):
        for k, v in json_input.items():
            try:
                if v[0:len(lookup_val)] == lookup_val:
                    yield k, parent_key, element_type
                else:
                    yield from item_generator(v, lookup_val, parent_key=k)
            except:
                yield from item_generator(v, lookup_val, parent_key=k)
    elif isinstance(json_input, list):
        for item in json_input:
            yield from item_generator(item, lookup_val, parent_key=parent_key)
    else:
        try:
            if json_input[0:len(lookup_val)] == lookup_val:
                yield json_input, parent_key, 'List'
        except:
            pass

def replace_item(dict_key, dict_parent_key, element_type, old, new):
    if element_type=='Dict':
        if dict_parent_key is None:
            model_dict[dict_key]=model_dict[dict_key].replace(old, new)
        else:
            try:
                model_dict[dict_parent_key][dict_key]=model_dict[dict_parent_key][dict_key].replace(old, new)
            except:
                model_dict['params'][dict_parent_key][dict_key]=model_dict['params'][dict_parent_key][dict_key].replace(old, new)
                return -1
    else:
        model_dict[dict_parent_key] = [item if item[0:len(old)]!=old else item.replace(old, new) for item in model_dict[dict_parent_key]]
    return 1

all_json_files=[]
for path in Path(model_dir_name).rglob('*.json'):
    all_json_files.append(path)

for one_file in all_json_files:
    print('processing: ' + one_file.parent.name + '/' + one_file.name)
    with open(one_file, 'r') as fp:
        model_dict = json.load(fp)

    to_replace=[]
    for _ in item_generator(model_dict, look_for_value):
        to_replace.append(_)

    for one_item in to_replace:
        if one_item[1] is not None: 
            parent_dict_is = one_item[1] 
        else: parent_dict_is = ''
        res=replace_item(one_item[0], one_item[1], one_item[2], look_for_value, replace_with_value)
        if res==-1:
            print('        replacing item: [params]' + one_item[0]  + '   parent: ' + parent_dict_is + '  ' + 'in file: ' + one_file.name)
        else:
            print('        replacing item:' + one_item[0]  + '   parent: ' + parent_dict_is + '  ' + 'in file: ' + one_file.name)

    # replace original file
    if len(to_replace)>0:
        with open(one_file, 'w') as fp:
            json.dump(model_dict, fp)        
pplonski commented 3 years ago

The fix is not backward compatible. There is a need to retrain the models. Users can now train in one directory and then move the AutoML directory (or change a name) and call the AutoML with a different name/path.

All updated code is in the dev branch. The fix will be available in the 0.9.0 release.