Training using 8 bit wights and 16 bit activations and run on SNPE - DSP

AlmogDavid commented 1 year ago

Hi, i trained a model using 16/8 configuration (attach the configuration JSON i used) and everything was fine during AIMET optimization. When I try to deploy the model the the DSP I use the commands mentioned in this post (that referred me to you): https://developer.qualcomm.com/forum/qdn-forums/software/qualcomm-neural-processing-sdk/70529

The run on the DSP gives bad results, not similar at all to the ones from AIMET training.

What am i doing wrong? Please don't tell me to talk with SNPE team as they told me to talk with you.

This is the configuration I'm using: { "defaults": { "ops": { "is_output_quantized": "True" }, "params": { "is_quantized": "True", "is_symmetric": "True" }, "strict_symmetric": "False", "per_channel_quantization": "True" }, "params": { "bias": { "is_quantized": "False" } }, "op_type": { "Squeeze": { "is_output_quantized": "False" }, "Pad": { "is_output_quantized": "False" }, "Mean": { "is_output_quantized": "False" } }, "supergroups": [ { "op_list": [ "Conv", "Relu" ] }, { "op_list": [ "ConvTranspose", "Relu" ] }, { "op_list": [ "Conv", "Clip" ] }, { "op_list": [ "Add", "Relu" ] }, { "op_list": [ "Gemm", "Relu" ] } ], "model_input": { "is_input_quantized": "True" }, "model_output": {} }

And this is the how I create the AIMET model: ` def _quant_model_aimet(self, image_dl: DataLoader): from aimet_common.defs import QuantScheme from aimet_torch.batch_norm_fold import fold_all_batch_norms as aimet_fold_all_batch_norms from aimet_torch.model_preparer import prepare_model as aimet_prepare_model from aimet_torch.auto_quant_v2 import AutoQuant from aimet_torch.quantsim import QuantizationSimModel

    config_file = os.path.join(os.path.dirname(__file__), "aimet_quantization_config.json")
    num_iters_quant = self.cfg.quantization.num_iter_for_quant_params

    self.fld_model.eval()
    self.fld_model.cpu()  # Move back to cpu
    fld_model = copy.deepcopy(self.fld_model).eval().cpu()

    # Run dummy iterations just to get shapes initialized
    dummy_input = next(iter(image_dl))[data_features.FACE_IMAGE]
    fld_model(dummy_input)

    # Prepare model + fold BN
    fld_model = aimet_prepare_model(fld_model)
    input_shape = tuple([1, self.cfg.model.input_shape_hwc[2], self.cfg.model.input_shape_hwc[0],
                         self.cfg.model.input_shape_hwc[1]])
    aimet_fold_all_batch_norms(fld_model, input_shapes=input_shape)  # Done inplace

    # Post Quant using AutoQuant
    class Datagen(torch.utils.data.IterableDataset):

        def __iter__(self):
            total = 0
            while True:
                for r in image_dl:
                    for image in r[data_features.FACE_IMAGE]:
                        if total >= num_iters_quant:
                            return
                        total += 1
                        yield image

        def __len__(self):
            return num_iters_quant

    image_gen_dl = DataLoader(Datagen())

    @torch.no_grad()
    def eval_callback(model: torch.nn.Module, max_num_iters: Optional[int]) -> float:
        model.eval()
        model = model.cpu()
        samples_saw = 0
        total_error = 0
        max_num_iters = num_iters_quant if max_num_iters is None else max_num_iters

        for r in image_dl:
            if samples_saw > max_num_iters:
                break

            valid_samples = (r[data_features.IS_FACE_PRESENT] > 0).flatten()
            face_images = r[data_features.FACE_IMAGE][valid_samples]
            landmarks = r[data_features.LANDMARKS_2D][valid_samples][:, self.fld_model.ld_for_deployment, :]
            landmarks_valid = r[data_features.LANDMARK_VALID][valid_samples][:, self.fld_model.ld_for_deployment]
            pred = model(face_images)
            _, ld_pred = self.fld_model.postprocess_pred(pred)

            error = torch.linalg.norm(landmarks.reshape(-1, 2)[landmarks_valid.flatten()] - ld_pred.reshape(-1, 2)[landmarks_valid.flatten()].cpu(), dim=1).mean().item()
            total_error += error
            samples_saw += valid_samples.sum().item()

        error = total_error / samples_saw
        print(f"Auto Quant current error: {error:.4f}")
        return error

    auto_quant = AutoQuant(fld_model,
                           quant_scheme=QuantScheme.training_range_learning_with_tf_init,
                           dummy_input=dummy_input,
                           config_file=config_file,
                           data_loader=image_gen_dl,
                           output_bw=self.cfg.quantization.activation_bw,
                           param_bw=self.cfg.quantization.weights_bw,
                           eval_callback=eval_callback)
    _, initial_accuracy = auto_quant.run_inference()
    print(f"NME before AutoQuant: {initial_accuracy:.4f}")

    fld_model, optimized_nme, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
    print(f"Optimized NME - AutoQuant (before -> after): {initial_accuracy:.4f} -> {optimized_nme:.4f}")

    # Create QAT model
    fld_quantsim = QuantizationSimModel(model=fld_model.to(get_device()),
                                        quant_scheme=QuantScheme.training_range_learning_with_tf_init,
                                        dummy_input=next(iter(image_dl))[data_features.FACE_IMAGE].to(get_device()),
                                        default_output_bw=self.cfg.quantization.activation_bw,
                                        default_param_bw=self.cfg.quantization.weights_bw,
                                        config_file=config_file)

    fld_quantsim.compute_encodings(forward_pass_callback=partial(
        model_runner_callback,
        max_iterations=num_iters_quant,
        device=get_device(),
        features=[data_features.FACE_IMAGE]),
        forward_pass_callback_args=image_dl)

    return fld_quantsim

`

quic-mangal commented 1 year ago

@quic-akhobare & @quic-klhsieh could you help reply to this.

quic-akhobare commented 1 year ago

Hi @AlmogDavid - the following details would help

What kind of model are we working with here? And dataset? If its a proprietary one, the nearest well-known architecture may help.
You say you "trained" the model. The script only shows post-training methods (AutoQuant). Did you train the model after AutoQuant?
I don't see explicitly the model export part. Can you tell us how the model was exported to take to SNPE?

AlmogDavid commented 1 year ago

Hi, thanks for replying. I train a facial Landmark detection it's a regression model. on a propriety dataset. The closest model is mobilenet v2. I trained the quant Sim model generated using the script I shared with you. I see its convergence.

The exporting just use export API and produce an ONNX model and the embeddings. In my post there is a link to snpe forum where I shared how I used snpe toolkit in order to get a DLC

quic-akhobare commented 1 year ago

I don't see anything wrong with your approach per-se. I followed your thread on the snpe forum and responded there. I am guessing that we are missing some command line arguments when invoking snpe.

Are you providing exactly the same calibration data set to snpe that also used for AIMET?

quic / aimet

Training using 8 bit wights and 16 bit activations and run on SNPE - DSP #2197