issues
search
speed1313
/
jax-llm
JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dataset.
https://speed1313.github.io/posts/llm-from-scratch/
MIT License
10
stars
2
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
古文オンリーデータセット
#24
speed1313
opened
6 months ago
1
os path join使う
#23
speed1313
closed
6 months ago
0
config file や file_pathの整理
#22
speed1313
closed
6 months ago
0
add decoding algo
#21
speed1313
closed
6 months ago
0
efficient generate
#20
speed1313
closed
6 months ago
0
損失のところソフトマックスかけてるか見る
#19
speed1313
closed
6 months ago
0
dataloaderをJAX likeにする
#18
speed1313
closed
6 months ago
0
dropout 0にしてみる
#17
speed1313
closed
6 months ago
0
Shakespeare で損失の下がり具合見る
#16
speed1313
closed
6 months ago
0
mhaを消してみる
#15
speed1313
closed
6 months ago
1
wikitext-ja
#14
speed1313
closed
6 months ago
1
Publish models on HuggingFace
#13
speed1313
opened
6 months ago
0
Training on 青空文庫
#12
speed1313
closed
6 months ago
1
training on my blog posts
#11
speed1313
opened
6 months ago
0
get batches with dynamic slice
#10
speed1313
closed
6 months ago
1
Dataclass GPTConfig
#9
speed1313
closed
6 months ago
0
Check if Token embedding indicates the nature of word embedding
#8
speed1313
opened
6 months ago
1
Code reading about Tiktoken
#7
speed1313
opened
6 months ago
0
RLHF
#6
speed1313
opened
6 months ago
0
fine tuning
#5
speed1313
opened
6 months ago
0
vocab size from 50257 to 50304
#4
speed1313
closed
6 months ago
0
distributed training
#3
speed1313
opened
6 months ago
2
evaluate my llm
#2
speed1313
opened
6 months ago
0
explore why my gpt-2 model's total params is about 80M
#1
speed1313
closed
6 months ago
1