Closed veritas9872 closed 1 year ago
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale https://arxiv.org/abs/2208.07339 https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/
Facebook (Meta)와 HuggingFace에서 나온 논문으로 Large language model에서 int8 data type으로 quantization을 했을 때에도 고성능이 나올 수 있다는 것을 보인 논문입니다.
이번 ICLR에서 bitsandbytes라는 dynamic int8 optimizer quantization 논문이 accept되었는데 그 때 처음 제시된 adaptive quantization 방법론을 LLM inference에 적용합니다.
Calibrated Selective Classification https://arxiv.org/abs/2208.12084
No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects https://arxiv.org/abs/2208.03641 https://github.com/LabSAINT/SPD-Conv
흥미로운 논문:
PEER: A Collaborative Language Mode https://arxiv.org/abs/2208.11663
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers https://arxiv.org/abs/2208.06366
Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Task https://arxiv.org/abs/2208.10442
Understanding Diffusion Models: A Unified Perspective https://arxiv.org/abs/2208.11970
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? https://arxiv.org/abs/2204.05832
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale https://arxiv.org/abs/2208.07339 https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/
Facebook (Meta)와 HuggingFace에서 나온 논문으로 Large language model에서 int8 data type으로 quantization을 했을 때에도 고성능이 나올 수 있다는 것을 보인 논문입니다.
이번 ICLR에서 bitsandbytes라는 dynamic int8 optimizer quantization 논문이 accept되었는데 그 때 처음 제시된 adaptive quantization 방법론을 LLM inference에 적용합니다.
Calibrated Selective Classification https://arxiv.org/abs/2208.12084
No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects https://arxiv.org/abs/2208.03641 https://github.com/LabSAINT/SPD-Conv
흥미로운 논문:
PEER: A Collaborative Language Mode https://arxiv.org/abs/2208.11663
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers https://arxiv.org/abs/2208.06366
Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Task https://arxiv.org/abs/2208.10442
Understanding Diffusion Models: A Unified Perspective https://arxiv.org/abs/2208.11970
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? https://arxiv.org/abs/2204.05832