mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.6k stars 142 forks source link

Hyper-parameter to train seg model #42

Open seefun opened 8 months ago

seefun commented 8 months ago

Could you introduce the parameter settings for fine-tuning downstream tasks in details?

I am unable to reproduce the training results on cityscape when using efficientvit-b series models. (The accuracy is much lower than that of the paper)

By the way, during TensorRT deployment, using FP16 may encounter nan or huge loss of accuracy. Using a relatively large Weight Decay (0.1) can not avoid NAN when using TensorRT FP16 for deployment. Is this model not TRT FP16-friendly enough, or is there a problem with the model training? Is it relate to https://github.com/mit-han-lab/efficientvit/issues/15 ? Is there any way to avoid it at TRT FP16 Inference?