Open nscotto opened 3 years ago
:octocat: cibot: Thank you for posting issue #909. The person in charge will reply soon.
@nscotto Thanks for the query.
I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).
Of course, supporting quantization is important from a performance perspective and we do have plans.
We are currently working on 1 and will be released soon.
What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?
We do have C/C++ APIs. C and C++ APIs will be updated including inference API.
Our goal is to get the best performances possibles and we'd like to minimize the use of float32.
As @jijoongmoon clarified, support for quantized models would be available soon. However, please note that, for now, the weights which are being trained will be using float computations (non-trainable computations can happen in float) during training (not inference).
Following this discussion about quantized model support
Sorry for the late reply, I needed to discuss with the team to understand a bit more our needs. I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).
You may contribute such features directly to nntrainer as well. :)
What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?
For inferences, you may use nntrainer directly or ML-API/nnstreamer with nntrainer subplugin.
The code in nntrainer.git /nnstreamer/tensor_filter/*
is the nntrainer subplugin for nnstreamer, which is also a usage example of inferencing with nntrainer.
Following this discussion about quantized model support
Sorry for the late reply, I needed to discuss with the team to understand a bit more our needs. I think what we'd like to know is whether nntrainer has to work with float32 (at the cost of some performances) or if it can train directly a quantized model, or maybe a mixture of both (like having non trainable part quantized and trainable part float32).
What isn't clear as well is wether there can be a difference in the format during inference and training, does nntrainer even have an api for inference or is it only used for training?
Our goal is to get the best performances possibles and we'd like to minimize the use of float32.