qzhu2017 / PyXtal_ml

a Python3 library for ML modeling materials properties
MIT License
11 stars 1 forks source link

[Question] how to pass the user defined parameters to ml training #33

Closed qzhu2017 closed 5 years ago

qzhu2017 commented 5 years ago

@yanxon , as we discussed, we should allow the user to provide their own parameter list for ml training. However, I don't know how to call it from run.py or main.py. Could you please create an example. For instance, I want to call n_estimator = 20 in a single run of RF training for n_estimator = [10, 50] in gridsearch for RF training?

yanxon commented 5 years ago

@qzhu2017

I didn't remodel the user-defined parameter after you switch to run.py. But, it should work by doing this:

jsonfile = resource_filename("pyxtal_ml", "datasets/nonmetal_MP_8049.json")
algos = ['RF']
N_sample = 100
feature = 'Chem'
level = {'cv': 5, 'params':{"n_estimator":[10,50]}} # for single level = {'cv': 5, 'params':{"n_estimator":[20]}}
pipeline = 'VT'

runner = run(N_sample=N_sample, jsonfile=jsonfile, level=level, feature=feature)
runner.load_data()
runner.convert_data_1D(parallel=2) #choose cpu number if you want to active this function
runner.choose_feature(keys='Chem') #choose feature combinations if you want
for algo in algos:
    runner.ml_train(algo=algo, pipeline=pipeline)
runner.print_time()
yanxon commented 5 years ago

@qzhu2017

I update the Readme.md in ml folder again.

qzhu2017 commented 5 years ago

@yanxon

from pyxtal_ml.run import run
from pkg_resources import resource_filename

jsonfile = resource_filename("pyxtal_ml", "datasets/nonmetal_MP_8049.json")
algos = ['RF']
N_sample = 100
feature = 'Chem'
level = {'cv': 5, 'params':{"n_estimator":[10,50]}} # for single level = {'cv': 5, 'params':{"n_estimator":[20]}}
pipeline = 'VT'

runner = run(N_sample=N_sample, jsonfile=jsonfile, level=level, feature=feature)
runner.load_data()
runner.convert_data_1D(parallel=2) #choose cpu number if you want to active this function
runner.choose_feature() #choose feature combinations if you want
for algo in algos:
    runner.ml_train(algo=algo, pipeline=pipeline)
runner.print_time()

it returns:

Traceback (most recent call last):
  File "0.py", line 16, in <module>
    runner.ml_train(algo=algo, pipeline=pipeline)
  File "/scratch/qzhu/github/PyXtal_ml/pyxtal_ml/run.py", line 126, in ml_train
    ml = method(feature=self.X, prop=self.Y, algo=self.algo, tag=tag, pipeline = self.pipeline, params=self.level)
  File "/scratch/qzhu/github/PyXtal_ml/pyxtal_ml/ml/ml_sklearn.py", line 69, in __init__
    self.ml()
  File "/scratch/qzhu/github/PyXtal_ml/pyxtal_ml/ml/ml_sklearn.py", line 138, in ml
    self.grid, self.CV = self.get_params_for_gridsearch(self.level, self.params)
  File "/scratch/qzhu/github/PyXtal_ml/pyxtal_ml/ml/ml_sklearn.py", line 99, in get_params_for_gridsearch
    p_grid = params_[keys[1]]
IndexError: list index out of range

Please make sure it works

yanxon commented 5 years ago

@qzhu2017

I changed the user-defined parameters architecture a little bit again: level = {'my_params': {"n_estimators": [10]}, 'CV': 4}

  1. Now, you can input this in any order, i.e. CV before params, or vice versa.
  2. You can call it 'my_parameters' or 'cross-validation'. It doesn't matter.
  3. You can also call 'params' alone without the cv. The default value for cv is 10. {'my_params': {"n_estimators": [10]}}.
  4. However, the only catch is that if you want to call 'cv' alone, you have to put an empty dict to hold the params. For example level = {'my_params': {}, 'cv':4}

Please let me know if you have any question.