young-geng EasyLM issues

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.38k stars 254 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Need help regarding Data Format for Fine Tuning

#63 RahulSundkar closed 1 year ago
2
transformers version doesn't support Llama conversion to huggingface format

#62 ruiqi-zhong closed 1 year ago
1
Advice/expectations on throughput

#61 float-trip closed 1 year ago
2
When I train a 30-billion-parameter Llama model using V3-256, what configuration would be appropriate? I've tried '1, 64, 4', '1, 128, 2', and '1, 32, 8', but none of them worked."

#60 joytianya closed 1 year ago
4
[Doc error]: Outdated doc for LLAMA

#59 llCurious opened 1 year ago
1
Add LoRA support?

#58 Kimiko-AI closed 1 year ago
1
Is there a plan to support Falcon?

#57 joytianya closed 1 year ago
1
Supporting Falcon-40b-instruct

#56 innerop closed 1 year ago
0
Recommended setup given a v4-512?

#55 OhadRubin closed 1 year ago
1
Fix a typo in HuggingfaceDataset

#53 syzymon closed 1 year ago
0
where is vocab file?

#52 lucasjinreal closed 1 year ago
1
May I ask about the configs of pre-training? For example, did you use dropout?

#51 joytianya closed 1 year ago
2
Fix unstable RMSNorm

#50 ZYHowell closed 1 year ago
2
Model serving example

#49 congyingxia closed 1 year ago
1
HF tokenizer taking too long to load

#48 basujindal closed 1 year ago
1
How you handle the attention mask for dataset chunk?

#47 waterhorse1 closed 1 year ago
2
Do you provide an API ?

#46 prog-amateur2 closed 1 year ago
1
install so slow and any discord channel or group for communicate ?

#45 joostshao closed 1 year ago
1
is it hard to support BLOOM?

#44 tiendung closed 1 year ago
1
A detailed question on LLaMA training script.

#43 zhangzx-uiuc closed 1 year ago
1
FSDP vs Model Parallelism

#42 jeromeku closed 1 year ago
7
Because during training, I saw that the code added a starting character like <s> in the first position. Should we also add this character during inference to maintain consistency with training?

#41 joytianya closed 1 year ago
4
If it is pre-training, can we just omit the [] directly?

#40 joytianya closed 1 year ago
1
Avoid OOM in Llama train example and align with serve example

#39 juliensalinas closed 1 year ago
2
Fix mesh_dim param in doc

#38 juliensalinas closed 1 year ago
0
Does wandb_dir support GCP paths?

#37 joytianya closed 1 year ago
1
optimizer.accumulate_ gradient_ steps Will the related changes to this configuration increase the usage of graphics memory?

#36 joytianya closed 1 year ago
1
For 30B LLama model, can server be supported by configuring mesh_dims on tpu v3-8 (128g)? I tried 8,1 and 4,1 but they don't seem to work.

#35 joytianya opened 1 year ago
2
Training OPT with Koala dataset

#34 Linohong closed 1 year ago
3
When using fsdp=true, the error message is the same. Does this not have any effect?

#33 joytianya closed 1 year ago
13
Koala hyperparameters + Running on TPU pod?

#32 hamishivi closed 1 year ago
2
Code for fine-tuning?

#31 milsun closed 1 year ago
2
When I increase the accumulate_gradient_steps, can the batch_size also be increased accordingly?

#30 joytianya closed 1 year ago
1
truncated text?

#29 shikida closed 1 year ago
0
Support for multi-host GPU training

#28 szxiangjn closed 1 year ago
1
Support for Cerebras-GPT models?

#27 MasterScrat opened 1 year ago
0
When I configured batch size 4 on v3-8, it was normal. But when I configured batch size 128 on v3-256, it reported OOM.What is the reason?

#26 joytianya closed 1 year ago
3
Can save_checkpoint support writing to a GCS path?

#25 joytianya closed 1 year ago
2
Corrected mistake in a shell command. Improved readability

#24 wthoutanymmries closed 1 year ago
0
err converting to HF

#23 JMHAVANCE closed 1 year ago
2
Is it normal for the learning rate to reach the peak_value and not decrease, but instead slowly rise?

#22 joytianya closed 1 year ago
8
Checksum for recovered models?

#21 koreyou closed 1 year ago
1
Conda install slow

#20 abdulfatir closed 1 year ago
1
RAM requirements

#19 MichaelMartinez opened 1 year ago
6
Get Error when Recovering the Koala Model Weights

#18 zhangsanfeng86 closed 1 year ago
4
difference between 13b v1 and v2 weight diffs + GPU requirements

#17 jpollard-cs closed 1 year ago
1
Uploaded the weights to Huggingface Hub

#16 Logophoman closed 1 year ago
1
Can the 'load_checkpoint' parameter support reading files from GCP?

#15 joytianya closed 1 year ago
2
When using 13b, with the following configuration, a memory error occurs on v3-8. May I ask what is the reason?

#14 joytianya closed 1 year ago
13
Fix load_pickle for typos.

#13 Yulv-git closed 1 year ago
0

Previous Next