Feature Request: Add Mish activation function

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.45k stars 1.24k forks source link

Mish is a new novel activation function proposed in this paper. It has shown promising results so far and has been adopted in several packages including:

TensorFlow-Addons
SpaCy (Tok2Vec Layer)
Thinc - SpaCy's official NLP based ML library
Echo AI
Eclipse's deeplearning4j
CNTKX - Extension of Microsoft's CNTK
FastAI-Dev
Darknet
Yolov3
BeeDNN - Library in C++
Gen-EfficientNet-PyTorch
dnet

All benchmarks, analysis and links to official package implementations can be found in this repository

It would be nice to have Mish as an option within the activation function group.

This is the comparison of Mish with other conventional activation functions in a SEResNet-50 for CIFAR-10: (Better accuracy and faster than GELU) se50_1

intel-analytics / ipex-llm

Feature Request: Add Mish activation function #2968