Hyper-parameter to train seg model

Could you introduce the parameter settings for fine-tuning downstream tasks in details？

I am unable to reproduce the training results on cityscape when using efficientvit-b series models. (The accuracy is much lower than that of the paper)

By the way, during TensorRT deployment, using FP16 may encounter nan or huge loss of accuracy. Using a relatively large Weight Decay (0.1) can not avoid NAN when using TensorRT FP16 for deployment. Is this model not TRT FP16-friendly enough, or is there a problem with the model training? Is it relate to https://github.com/mit-han-lab/efficientvit/issues/15 ? Is there any way to avoid it at TRT FP16 Inference？

mit-han-lab / efficientvit

Hyper-parameter to train seg model #42