jzhang38 TinyLlama issues

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.71k stars 453 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to finetune on custom dataset

#48 hrsmanian closed 11 months ago
11
Why train three epochs? not one epoch?

#47 PeiqinSun closed 1 year ago
1
Any plans for the ONNX runtime?

#46 VatsaDev closed 1 year ago
3
Update README_zh-CN.md

#45 ChaosCodes closed 1 year ago
0
下载数据集不便

#44 scorpjr1 closed 1 year ago
3
Update prepare_slimpajama.py

#43 michael-c-max closed 8 months ago
0
How to compute token numbers for a dataset？

#42 Arcmoon-Hu closed 11 months ago
4
Why is the vocab size of `TinyLlama-1.1B-Chat-V0.1` 32001?

#41 Chillee closed 1 year ago
5
Question Regarding the Absence of BOS and EOS Tokens in Tokenizer Encoding

#40 dtxwhzw closed 1 year ago
3
Release format + artefact

#39 PierreColombo closed 1 year ago
3
Update EVAL.md

#38 jzhang38 closed 1 year ago
0
Why is Swiglu packed_weights = False?

#37 larrylawl closed 1 year ago
1
Resuming training

#36 artnoage opened 1 year ago
8
TinyLlama-1.1B-orca-gpt4

#35 acalatrava closed 1 year ago
1
info when load model

#34 shyoulala closed 1 year ago
3
How did you determine the size of the TinyLlama model?

#33 dtxwhzw closed 1 year ago
2
eval loss become nan after a single batch

#32 ThibaultCastells closed 1 year ago
4
Request: Finetune the Model on more Data?

#31 VatsaDev closed 1 year ago
1
Working Chat Demo

#30 VatsaDev closed 8 months ago
8
TinyLlama-chat outputs truncated/small?

#29 VatsaDev closed 1 year ago
1
Fix the finetune directory link in README.md

#28 VatsaDev closed 1 year ago
1
Minimum learning rate

#27 artnoage closed 1 year ago
1
国内模型镜像

#26 Ma-Yongqiang closed 1 year ago
9
Why does a dimension mismatch occur when I use AutoModelForCausalLM to load a model?

#25 BaenRH closed 1 year ago
2
Getting gibberish output when running on llama.cpp

#24 luungoc2005 closed 1 year ago
33
Why is tokenizer.model_max_length set to 1000000000000000019884624838656?

#23 kevinhu closed 1 year ago
2
A guide to adding more datasets

#22 VatsaDev closed 1 year ago
3
Has anyone used this code base for incremental pretraining of llama-2-7b?

#21 s1ghhh closed 1 year ago
4
我想要使用这个模型

#20 ChuXNobody closed 11 months ago
17
where is "rotary_emb"?

#19 ScottishFold007 closed 1 year ago
1
Credit to the FlashAttention repo

#18 tridao closed 1 year ago
1
Are there any provided 4bit quant weights, or like a colab detailing quantization?

#17 VatsaDev closed 1 year ago
2
Simple WebUI for the project

#16 VatsaDev closed 1 year ago
5
Cleaned Notebook

#15 VatsaDev closed 1 year ago
2
Can it run on CPU?

#14 abdul-jabbar-ms closed 1 year ago
10
How to train model with databricks-dolly-15k.jsonl dataset format.

#13 TapendraBaduwal closed 1 year ago
4
Have you considered code llama?

#12 IgorTodorovskiIBM closed 1 year ago
7
Would this be possible to finetune on a weaker gpu like a t4?

#11 VatsaDev closed 1 year ago
6
How do you plan on dealing with hallucinations due to knowledge compression?

#10 VatsaDev opened 1 year ago
16
Add support for VEDV (https://github.com/yunielrc/vedv)

#9 yunielrc closed 1 year ago
0
More eval results

#8 GeneZC closed 1 year ago
2
Where is RLHF?

#7 MimeZoe0628 closed 1 year ago
1
Colab

#6 rarhs closed 1 year ago
1
Where dose the rotary_emb import come from?

#5 nmharmon8 closed 1 year ago
2
where is flash attention 2

#4 eyuansu62 closed 1 year ago
1
Hardware requirements

#3 ajinkya123-robo closed 1 year ago
2
Support Chinese Language？

#2 Johnson-yue closed 1 year ago
3
fix live tracking link

#1 Green-Sky closed 1 year ago
0