Open scene-the-ella opened 1 year ago
최신 소식 공유
https://github.com/gyunggyung/KoChatLLaMA.cpp https://www.facebook.com/groups/1272877526915876/permalink/1277329939803968/
llama.cpp Inference of Facebook's LLaMA model in pure C/C++
Hot topics
The main goal is to run the model using 4-bit quantization on a MacBook.
This was hacked in an evening - I have no idea if it works correctly. Please do not make conclusions about the models based on the results from this implementation. For all I know, it can be completely wrong. This project is for educational purposes and is not going to be maintained properly. New features will probably be added mostly through community contributions, if any.
Here is a typical run using LLaMA-7B:
make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
make: Nothing to be done for `default'.
main: seed = 1678486056
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
1 -> ''
8893 -> 'Build'
292 -> 'ing'
263 -> ' a'
4700 -> ' website'
508 -> ' can'
367 -> ' be'
2309 -> ' done'
297 -> ' in'
29871 -> ' '
29896 -> '1'
29900 -> '0'
2560 -> ' simple'
6576 -> ' steps'
29901 -> ':'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000
Building a website can be done in 10 simple steps:
1) Select a domain name and web hosting plan
2) Complete a sitemap
3) List your products
4) Write product descriptions
5) Create a user account
6) Build the template
7) Start building the website
8) Advertise the website
9) Provide email support
10) Submit the website to search engines
A website is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user's browser.
The web pages are stored in a web server. The web server is also called a host. When the website is accessed, it is retrieved from the server and displayed on the user's computer.
A website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user's screen.
A website can also be viewed on different devices such as desktops, tablets and smartphones.
Hence, to have a website displayed on a browser, the website must be hosted.
A domain name is an address of a website. It is the name of the website.
The website is known as a website when it is hosted. This means that it is displayed on a host. The host is usually a web server.
A website can be displayed on different browsers. The browsers are basically the software that renders the website on the user’s screen.
A website can also be viewed on different devices such as desktops, tablets and smartphones. Hence, to have a website displayed on a browser, the website must be hosted.
A domain name is an address of a website. It is the name of the website.
A website is an address of a website. It is a collection of web pages that are formatted with HTML. HTML is the code that defines what the website looks like and how it behaves.
The HTML code is formatted into a template or a format. Once this is done, it is displayed on the user’s browser.
A website is known as a website when it is hosted
main: mem per token = 14434244 bytes
main: load time = 1332.48 ms
main: sample time = 1081.40 ms
main: predict time = 31378.77 ms / 61.41 ms per token
main: total time = 34036.74 ms
And here is another demo of running both LLaMA-7B and whisper.cpp on a single M1 Pro MacBook:
Here are the step for the LLaMA-7B model:
# build this repo
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model
# install Python dependencies
python3 -m pip install torch numpy sentencepiece
# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1
# quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
Coming soon.
우리의 인코더는 사전 훈련을 통해 300개 이상의 언어를 통합한다. 우리는 YouTube Caption의 다국어 음성 데이터에 대한 미세 조정을 통해 사전 훈련된 인코더의 효과를 입증한다. 감독된 유튜브 데이터는 73개 언어를 포함하고 있으며 언어당 평균 3,000시간 미만의 데이터를 가지고 있다. 제한된 감독 데이터에도 불구하고, 이 모델은 73개 언어에서 평균 30% 미만의 단어 오류율(WER; 낮은 것이 더 좋다)을 달성하며, 이는 우리가 이전에 달성한 적이 없는 이정표이다. en-US의 경우 USM은 현재 내부 최첨단 모델에 비해 상대적으로 WER이 6% 낮다. 마지막으로, 우리는 최근 출시된 대형 모델인 Whisper(large-v2)와 비교하는데, 이 모델은 40만 시간 이상의 레이블링된 데이터로 훈련되었다. 비교를 위해, 우리는 위스퍼가 40% 미만의 WER로 성공적으로 디코딩할 수 있는 18개 언어만 사용한다. 우리 모델은 이러한 18개 언어의 Whisper에 비해 평균적으로 32.7% 낮은 WER을 가지고 있다.
-- USM supports all 73 languages in the YouTube Captions' Test Set and outperforms Whisper on the languages it can support with lower than 40% WER. Lower WER is better.
Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages. arXivGPT "default" prompt is used The paper "Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages" describes a single large model, the Universal Speech Model (USM), that performs automatic speech recognition (ASR) across 100+ languages by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million hours spanning over 300 languages and fine-tuning on a smaller labeled dataset.
Key insights and lessons learned from the paper include:
Multilingual pre-training with random-projection quantization and speech-text modality matching can achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. USM exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages compared to the Whisper model, despite using a labeled training set 1/7th the size of Whisper's training set. USM significantly reduces model complexity and inference latency compared to traditional approaches that require multiple language-specific models. The paper highlights the importance of a large, diverse multilingual dataset for pre-training and fine-tuning the model, as well as the effectiveness of random-projection quantization and speech-text modality matching. Three questions to ask the authors:
How does USM compare to other large-scale multilingual speech recognition models, such as Facebook's wav2vec and wav2vec 2.0 models? Have you explored using USM for other speech-related tasks, such as speaker identification or emotion recognition? Can USM be extended to handle low-resource languages with limited labeled training data, and if so, what techniques might be effective? Three suggestions for related topics or future research directions:
Investigate the transfer learning capabilities of USM for other natural language processing tasks, such as text classification or named entity recognition. Explore the impact of additional pre-training tasks on USM's performance, such as masked language modeling or sequence-to-sequence translation. Investigate the effectiveness of USM for speech recognition in noisy or adverse acoustic environments. Relevant references:
Baevski, A., & Auli, M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. arXiv preprint arXiv:2006.11477. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Joulin, A. (2021). Unsupervised Cross-lingual Representation Learning at Scale. arXiv preprint arXiv:2012.15761. Ghoshal, A., & Swietojanski, P. (2017). Multi-lingual training of convolutional neural networks for low-resource speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5220-5224). IEEE. Hu, B., Chen, Y., Zhang, W., Han, W., & Wu, Y. (2020). Exploring large-scale pretraining for speech recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 2386-2392). Khurana, U., Mahajan, M., Dhingra, B., Carlini, N., & Liu, Y. (2021). Multilingual speech recognition: A survey of recent advances. arXiv preprint arXiv:2103.03247.✏
뭐가 더 좋은지 확인 요망. API 신청 완료. 대부분 + 다국어는 구글, 이상 값은 메타?
(오랜만에 돌아왔습니다..)
Communication-Efficient Collaborative Heterogeneous Bandits in Networks
Diffusion Models are Minimax Optimal Distribution Estimators
Understanding the Diffusion Objective as a Weighted Integral of ELBOs
(ref) ((masked autoencoder))
Hyena Hierarchy: Towards Larger Convolutional Language Models
Blog: https://hazyresearch.stanford.edu/blog/2023-03-07-hyena ArXiv: https://arxiv.org/abs/2302.10866 GitHub: https://github.com/HazyResearch/safari
News
ArXiv
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Scaling up GANs for Text-to-Image Synthesis
PaLM-E: An Embodied Multimodal Language Model