speed1313 jax-llm issues

speed1313 / jax-llm

JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dataset.

https://speed1313.github.io/posts/llm-from-scratch/

MIT License

10 stars 2 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

古文オンリーデータセット

#24 speed1313 opened 6 months ago
1
os path join使う

#23 speed1313 closed 6 months ago
0
config file や file_pathの整理

#22 speed1313 closed 6 months ago
0
add decoding algo

#21 speed1313 closed 6 months ago
0
efficient generate

#20 speed1313 closed 6 months ago
0
損失のところソフトマックスかけてるか見る

#19 speed1313 closed 6 months ago
0
dataloaderをJAX likeにする

#18 speed1313 closed 6 months ago
0
dropout 0にしてみる

#17 speed1313 closed 6 months ago
0
Shakespeare で損失の下がり具合見る

#16 speed1313 closed 6 months ago
0
mhaを消してみる

#15 speed1313 closed 6 months ago
1
wikitext-ja

#14 speed1313 closed 6 months ago
1
Publish models on HuggingFace

#13 speed1313 opened 6 months ago
0
Training on 青空文庫

#12 speed1313 closed 6 months ago
1
training on my blog posts

#11 speed1313 opened 6 months ago
0
get batches with dynamic slice

#10 speed1313 closed 6 months ago
1
Dataclass GPTConfig

#9 speed1313 closed 6 months ago
0
Check if Token embedding indicates the nature of word embedding

#8 speed1313 opened 6 months ago
1
Code reading about Tiktoken

#7 speed1313 opened 6 months ago
0
RLHF

#6 speed1313 opened 6 months ago
0
fine tuning

#5 speed1313 opened 6 months ago
0
vocab size from 50257 to 50304

#4 speed1313 closed 6 months ago
0
distributed training

#3 speed1313 opened 6 months ago
2
evaluate my llm

#2 speed1313 opened 6 months ago
0
explore why my gpt-2 model's total params is about 80M

#1 speed1313 closed 6 months ago
1