Could you introduce the parameter settings for fine-tuning downstream tasks in details?
I am unable to reproduce the training results on cityscape when using efficientvit-b series models. (The accuracy is much lower than that of the paper)
By the way, during TensorRT deployment, using FP16 may encounter nan or huge loss of accuracy. Using a relatively large Weight Decay (0.1) can not avoid NAN when using TensorRT FP16 for deployment. Is this model not TRT FP16-friendly enough, or is there a problem with the model training? Is it relate to https://github.com/mit-han-lab/efficientvit/issues/15 ? Is there any way to avoid it at TRT FP16 Inference?
Could you introduce the parameter settings for fine-tuning downstream tasks in details?
I am unable to reproduce the training results on cityscape when using efficientvit-b series models. (The accuracy is much lower than that of the paper)
By the way, during TensorRT deployment, using FP16 may encounter nan or huge loss of accuracy. Using a relatively large Weight Decay (0.1) can not avoid NAN when using TensorRT FP16 for deployment. Is this model not TRT FP16-friendly enough, or is there a problem with the model training? Is it relate to https://github.com/mit-han-lab/efficientvit/issues/15 ? Is there any way to avoid it at TRT FP16 Inference?