intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.49k stars 1.24k forks source link

[Orca]Support pytorch models fine tuning #7433

Open songhappy opened 1 year ago

songhappy commented 1 year ago

Some popular image models like Mask-RCNN for object detection calculate the loss in a slightly different way:

loss_dict = model(images, targets)
 losses = sum(loss for loss in loss_dict.values())

other than

out = model(images)
loss = criteria(out, targets)

Orca estimator wants to support this model.

reference: https://github.com/pytorch/vision/blob/main/references/detection/engine.py#L31

hkvision commented 1 year ago

@leonardozcm I think we can support this now?

leonardozcm commented 1 year ago

@leonardozcm I think we can support this now?

Yes but I haven't tried yet. You may write down something like this in MainCallBack.on_iter_forward to achieve this?

Class CustomMainCB(MainCallBack):
    def on_iter_forward(self, runner):
        """
        If `on_train_forward` and `on_val_forward` are not overridden,
        this will be called during forward when training and validating.
        Any behavior inconsistent with the default forward behavior should be overridden here.
        """
        # Forward features
        image, target = runner.batch
        runner.output= runner.model(image, target)
        # Compute loss
        runner.loss = sum(loss for loss in runner.output.values())
songhappy commented 1 year ago

In this case, there will be a specific definition for loss function.
Estimator gives error message like this: File "/Users/guoqiong/intelWork/git/BigDL/python/orca/src/bigdl/orca/learn/pytorch/torch_runner.py", line 283, in train_epoch "You must provide a loss for train and evaluate.") File "/Users/guoqiong/intelWork/git/BigDL/python/dllib/src/bigdl/dllib/utils/log4Error.py", line 33, in invalidInputError raise RuntimeError(errMsg) RuntimeError: You must provide a loss for train and evaluate.

leonardozcm commented 1 year ago

In this case, there will be a specific definition for loss function. Estimator gives error message like this: File "/Users/guoqiong/intelWork/git/BigDL/python/orca/src/bigdl/orca/learn/pytorch/torch_runner.py", line 283, in train_epoch "You must provide a loss for train and evaluate.") File "/Users/guoqiong/intelWork/git/BigDL/python/dllib/src/bigdl/dllib/utils/log4Error.py", line 33, in invalidInputError raise RuntimeError(errMsg) RuntimeError: You must provide a loss for train and evaluate.

@hkvision May we relax this restriction? @songhappy how was hooks work?

hkvision commented 1 year ago

I think previously we have already taken this case into consideration. https://github.com/intel-analytics/BigDL/blob/main/python/orca/src/bigdl/orca/learn/pytorch/estimator.py#L38 Loss can be None, it isn't the behavior?

leonardozcm commented 1 year ago

I think previously we have already taken this case into consideration. https://github.com/intel-analytics/BigDL/blob/main/python/orca/src/bigdl/orca/learn/pytorch/estimator.py#L38 Loss can be None, it isn't the behavior?

Previously we disabled loss check in predict: https://github.com/intel-analytics/BigDL/pull/7168

hkvision commented 1 year ago

https://github.com/intel-analytics/BigDL/issues/4412 Seems previously this issue is noticed. Then let's plan to support this.