intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.62k stars 1.26k forks source link

Nano: Hard to apply inference optimizations with accuracy control on YOLOX with little code modification #5995

Open y199387 opened 2 years ago

y199387 commented 2 years ago

Description

YOLOX evaluate the model with COCOAPI. The evaluation-related code like this:

for cur_iter, (imgs, _, info_imgs, ids) in enumerate(
            progress_bar(self.dataloader)
        ):
            with torch.no_grad():
                imgs = imgs.type(tensor_type)

                # skip the last iters since batchsize might be not enough for batch inference
                is_time_record = cur_iter < len(self.dataloader) - 1
                if is_time_record:
                    start = time.time()

                outputs = model(imgs)
                if decoder is not None:
                    outputs = decoder(outputs, dtype=outputs.type())

                if is_time_record:
                    infer_end = time_synchronized()
                    inference_time += infer_end - start

                outputs = postprocess(
                    outputs, self.num_classes, self.confthre, self.nmsthre
                )
                if is_time_record:
                    nms_end = time_synchronized()
                    nms_time += nms_end - infer_end

            data_list_elem, image_wise_data = self.convert_to_coco_format(
                outputs, info_imgs, ids, return_outputs=True)
            data_list.extend(data_list_elem)
            output_data.update(image_wise_data)

It's hard to convert the code segment to a function like:

def metric(preds, target):
    ...
    return accuracy

I need to subclass InferenceOptimizer to apply inference acceleration with accuracy control on YOLOX for now.

from bigdl.nano.pytorch import InferenceOptimizer

def accuracy(model):
        coco_evaluator = exp.get_evaluator(args.batch_size, False)
        coco_evaluator.per_class_AP = True
        coco_evaluator.per_class_AR = True

        ap50_95, ap50, _ = coco_evaluator.evaluate(model, False)
        return ap50

    inference_optimizer = YoloxInferenceOptimizer()
    inference_optimizer.optimize(model, metric=accuracy ,training_data=train_loader)
    inference_optimizer.summary()

class YoloxInferenceOptimizer(InferenceOptimizer):
    def optimize(...):
       super().optimize(...)
       if metric:
            for method, acce_result in self.optimized_model_dict.items():
                with torch.no_grad():
                    result = metric(acce_result["model"], **metric_kwargs)
                self.optimized_model_dict[method]["accuracy"] = result

        self._optimize_result = _format_optimize_result(self.optimized_model_dict,
                                                                self._calculate_accuracy)

Is there a simple way to obtain inference acceleration with accuracy(AP/AR) control?

rnwang04 commented 2 years ago

will work on this.