gptune / GPTune

Other
67 stars 19 forks source link

Surrogate model empty in json file #12

Closed jaehoonkoo closed 2 years ago

jaehoonkoo commented 2 years ago

Hi @younghyunc,

I am writing to ask about loading surrogate model error. Can you please take a look?

I am using options['model_class'] = ‘Model_GPy_LCM’. When I tried to run TL after gathering data by sla, I am getting the following error.

machine: mymachine processor: haswell num_nodes: 1 num_cores: 128
[HistoryDB] use filelock for synchronization
[HistoryDB] Found a history database file
.....target problem size is sm [[500000], [100000], [1000000], [5000000]]
====================================== 100000
task_parameters_given:  [[100000]]
GPTune History Database Init
[HistoryDB] use filelock for synchronization
[HistoryDB] Found a history database file
Unable to find a surrogate model
MODEL DATA:  None
Traceback (most recent call last):
 File "demo_dtla.py", line 426, in <module>
   main()
 File "demo_dtla.py", line 391, in main
   model_functions[tvalue_] = LoadSurrogateModelFunction(meta_path=None, meta_dict=meta_dict)               
 File "/lcrc/project/EE-ECP/jkoo/code/gptune/GPTune/gptune.py", line 1923, in LoadSurrogateModelFunction
   gt = CreateGPTuneFromModelData(model_data)
 File "/lcrc/project/EE-ECP/jkoo/code/gptune/GPTune/gptune.py", line 1838, in CreateGPTuneFromModelData
   input_space_info = model_data["input_space"]
TypeError: 'NoneType' object is not subscriptable

In the json file, I found that “surrogate_model” is empty.

{
 "tuning_problem_name": "xsbench",
 "tuning_problem_category": "Unknown",
 "surrogate_model": [],
 "func_eval": [
   {
     "task_parameter": {
       "t": 10000000
     },
     "tuning_parameter": {
       "p0": "96",
       "p1": "cores",
       "p2": "master",
       "p3": "#pragma omp parallel for",
       "p4": "simd",
       "p5": "schedule(static,4)",
       "p6": "dynamic"
     },
     "constants": {},
     "machine_configuration": {
       "machine_name": "mymachine",
       "intel": {
         "nodes": 1,
         "cores": 128
       }
     },

Why the surrogate model is not saved and empty? Did I miss any?

Best, Jaehoon

younghyunc commented 2 years ago

Hi @jke513,

for now our DB records surrogate model data only when "Model_LCM" is used. The Model_LCM refers to our LCM modeling (presented in GPTune PPoPP 21 paper - Yang et al.). If you use "Model_GPy_LCM", it will use GPy's LCM - but our DB does not record surrogate models.

This means that, the transfer learning interface presented in our MCSoC 2021 paper would work only for "Model_LCM". We have been building new interfaces for better usage and for supporting other types of surrogate modeling. For example, instead of directly loading a surrogate model from database, we can read function evaluation results and build a surrogate model based on them, that can be used as the surrogate model for transfer learning.

In short, there will be two options you can take.

  1. Use Model_LCM rather than Model_GPy_LCM. You can still use "LoadSurrogateModelFunction", also this is the best way to reproduce the transfer learning method discussed in the MCSoC2021 paper.
  2. Use another API "BuildSurrogateModel" defined in gptune.py which takes function evaluation results and can build a surrogate model black-box function using Model_GPy_LCM. Using this new interface is not so difficult, however, we don't have a good manual yet, and this interface might further change because we are still actively working on it. So, right now I'd suggest to use option 1 (using Model_LCM), unless you are having troubles using Model_LCM.

@liuyangzhuan, do you have other comments or concerns?

Best, Younghyun

liuyangzhuan commented 2 years ago

Why we don't support Model_GPy_LCM in "LoadSurrogateModelFunction"? Doesn't it has the same hyperparameter list as Model_LCM?

Anyway, if you didn't build gptune with openmpi, you cannot use Model_LCM. So in long run, option 2 would be more desirable.

younghyunc commented 2 years ago

Hi @liuyangzhuan, at the time of development I only implemented and tested the feature for only Model_LCM. In principle I also think it's possible to support Model_GPy_LCM in LoadSurrogateModelFunction. However, for DB, I think we would need to record different hyperparameter lists for Model_LCM and Model_GPy_LCM (need to check further).

To provide a quick solution, I'd like to ask @jke513 which approach you prefer or which approach would be feasible on your system.

Thanks, Younghyun

jaehoonkoo commented 2 years ago

Hi Younghyun,

BuildSurrogateModel() works without clash.

I tried a run for sanity check such as: source tasks:

I run merge.py, but to collect func evals for the source task, use the json file from each source task, such as:

f = open('./TLA_experiments/SLA-GPTune-'+str(input_s[tvalue_])+'-200/xsbench.json’)
func_evals = json.load(f)

I am not sure merge.py is necessary.

I observe that configs for target task are evaluated, and configs for source tasks are loaded from the model: ret = model_functions[point['t']](point). and the logs gave at the end the output values are similar to what they are coded as above.

Does it look it works correctly?

I attached the output log file for your reference. Thank you for your time and help.

Best, Jaehoon test_run.txt

younghyunc commented 2 years ago

Hi Jaehoon,

thank you for sending me the log, and sorry for the slow reply.

Yes, I read your log file, and I think your example works correctly! As you observed, the method (MCSoC21) evaluates the target task, and uses a surrogate model for the source tasks. Regarding the mergy.py usage, this script is just to merge multiple JSON DB files into one. Otherwise, you don’t need to run this script. Please let me know if you have further questions.

Best, Younghyun

younghyunc commented 2 years ago

I think this issue has been solved. I will close this issue now. If you have further questions or other issues, please open another issue or send us an email.