intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.31k stars 1.23k forks source link

bigdl-nano auto-ml model select method #6760

Open Zjq9409 opened 1 year ago

Zjq9409 commented 1 year ago

Bigdl-nano Auto_Ml selects the model based on the last epoch loss, needs to be modified to select the one with the highest accuracy. for example, The Actual output Best trial Value is 0.4935064911842346 (the last epoch), but the model needs to be selected according to the best epoch.

Epoch 4:  69%|██████▉   | 9/13 [00:34<00:15,  3.86s/it, loss=1.5, v_num=7, val_loss=1.380, val_acc=0.506, train_loss=1.350, train_acc=0.510] 
Epoch 4:  77%|███████▋  | 10/13 [00:36<00:10,  3.66s/it, loss=1.5, v_num=7, val_loss=1.380, val_acc=0.506, train_loss=1.350, train_acc=0.510]
Epoch 4:  77%|███████▋  | 10/13 [00:36<00:10,  3.66s/it, loss=1.5, v_num=7, val_loss=1.380, val_acc=0.506, train_loss=1.350, train_acc=0.510]
Epoch 4:  77%|███████▋  | 10/13 [00:36<00:10,  3.66s/it, loss=1.29, v_num=7, val_loss=1.380, val_acc=0.506, train_loss=1.350, train_acc=0.510]

Validation: 0it [00:00, ?it/s]

Validation:   0%|          | 0/3 [00:00<?, ?it/s]

Validation DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]

Validation DataLoader 0:  33%|███▎      | 1/3 [00:00<00:01,  1.10it/s]

Validation DataLoader 0:  33%|███▎      | 1/3 [00:00<00:01,  1.10it/s]

Validation DataLoader 0:  67%|██████▋   | 2/3 [00:01<00:00,  1.18it/s]

Validation DataLoader 0:  67%|██████▋   | 2/3 [00:01<00:00,  1.18it/s]

Validation DataLoader 0: 100%|██████████| 3/3 [00:02<00:00,  1.54it/s]

Validation DataLoader 0: 100%|██████████| 3/3 [00:02<00:00,  1.54it/s]

                                                                      
Epoch 4:  77%|███████▋  | 10/13 [00:39<00:11,  3.90s/it, loss=1.29, v_num=7, val_loss=5.030, val_acc=0.494, train_loss=1.300, train_acc=0.540]
Epoch 4:  77%|███████▋  | 10/13 [00:39<00:11,  3.90s/it, loss=1.29, v_num=7, val_loss=5.030, val_acc=0.494, train_loss=1.300, train_acc=0.540]
Model: resnet50    Image size: (456, 456)
Number of finished trials: 1
Best trial:
  Value: 0.4935064911842346
  Params: 
    model_name▁choice: 0
<optuna.study.study.Study object at 0x7f2790fae130>
rnwang04 commented 1 year ago

Hi, we have supported the request you propose in https://github.com/intel-analytics/BigDL/pull/6826. we provide a new mode parameter for trainer.search:

model = CustomModel(
    out_dim = space.Categorical(16,32),
    dropout = space.Categorical(0.1,0.2),
    learning_rate = space.Real(0.001,0.01,log=True),
    batch_size = space.Categorical(32,64)
)

trainer = Trainer(
    logger=True,
    checkpoint_callback=False,
    max_epochs=2,
    use_hpo=True,
)

best_model = trainer.search(
    model,
    target_metric='val_loss',
    direction='minimize',
    n_trials=4,
    max_epochs=3,
    mode='best',  # default value is best, you can change to last.
)

and if you don't specify this parameter, it will select model based on the best results which I think is exactly your need.

You can have a try at your side : )

One note here, only tomorrow's nightly build version or later will have this new feature.