Support for quantized models (following)

nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.

Apache License 2.0

144 stars 72 forks source link

Support for quantized models (following) #909

Open nscotto opened 3 years ago

nscotto commented 3 years ago

Following this discussion about quantized model support

Sorry for the late reply, I needed to discuss with the team to understand a bit more our needs. I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).

What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?

Our goal is to get the best performances possibles and we'd like to minimize the use of float32.

taos-ci commented 3 years ago

:octocat: cibot: Thank you for posting issue #909. The person in charge will reply soon.

jijoongmoon commented 3 years ago

@nscotto Thanks for the query.

I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).

Of course, supporting quantization is important from a performance perspective and we do have plans.

Inference with Quantized Model
Train after de-quantized Model (only for the trainable part) and generate a quantized model for inference.
Train Quantized Model Directly

We are currently working on 1 and will be released soon.

What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?

We do have C/C++ APIs. C and C++ APIs will be updated including inference API.

kparichay commented 3 years ago

Our goal is to get the best performances possibles and we'd like to minimize the use of float32.

As @jijoongmoon clarified, support for quantized models would be available soon. However, please note that, for now, the weights which are being trained will be using float computations (non-trainable computations can happen in float) during training (not inference).

myungjoo commented 3 years ago

Following this discussion about quantized model support

Sorry for the late reply, I needed to discuss with the team to understand a bit more our needs. I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).

You may contribute such features directly to nntrainer as well. :)

What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?

For inferences, you may use nntrainer directly or ML-API/nnstreamer with nntrainer subplugin. The code in nntrainer.git /nnstreamer/tensor_filter/* is the nntrainer subplugin for nnstreamer, which is also a usage example of inferencing with nntrainer.