google maxtext issues - Githubissues

google / maxtext

A simple, performant and scalable Jax LLM!

Apache License 2.0

1.44k stars 263 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Enable expert parallelism for dropping strategy

#869 RissyRan opened 9 hours ago
0
Unable to recover after checkpoint saving

#868 peregilk opened 14 hours ago
0
Make running preflight optional in model scripts

#867 raymondzouu closed 1 day ago
0
add logging statement

#866 bernardhan33 opened 2 days ago
0
Cannot see multiple GPUs when using Slurm (with proposed fix)

#865 gabeweisz opened 2 days ago
0
Converting LLama3.1 405B checkpoint - Requesting multipass checkpoint conversion

#864 shivajid opened 3 days ago
1
Add MaxText run name to TensorBoard file directory

#863 bvandermoon closed 3 days ago
0
Improve tfds perf in multihost env

#862 aireenmei opened 4 days ago
0
Fix circ storage check for delayed case

#861 gobbleturk closed 1 day ago
0
Add load balance loss

#860 RissyRan closed 3 days ago
0
RA update works for all axes orders

#859 patemotter closed 1 week ago
0
Add simple MLP decoder block

#858 gobbleturk closed 1 week ago
0
Delay Activation Forwarding

#857 gobbleturk closed 1 week ago
1
added run_name_prefix to tensorboard

#856 kyle-google opened 1 week ago
1
Temporarily pin google-cloud-aiplatform to 1.61.0

#855 bvandermoon closed 1 week ago
0
[DRAFT] Add In Memory Changes for Pathways

#854 SujeethJinesh opened 1 week ago
0
Fix kernel imports

#853 gobbleturk closed 1 week ago
0
Add node attributes to the training benchmark

#852 bernardhan33 closed 1 week ago
0
Fix kernel imports

#851 gobbleturk closed 1 day ago
1
Add node attributes; Fix GCS upload; Add checkpointID to checkpointing workload

#850 bernardhan33 closed 1 week ago
1
aqtp release 0.8.0 breaking dependencies

#849 bernardhan33 closed 1 week ago
1
documenting XLA flags used by MaxText

#848 nhira closed 1 day ago
1
mlperf gpt3 ckpt permission issues

#847 gramesh-amd opened 1 week ago
7
Add Llama2 config for v5p

#846 raymondzouu closed 3 days ago
0
Adding Mixtral-8x22b

#845 rdyro closed 1 day ago
1
How to load tfrecords from local file system for Mlperf training?

#844 gramesh-amd closed 1 week ago
3
Add Gemma2-27b

#843 ZhaoyueCheng closed 1 week ago
0
Optimize overhead right before the first train_step

#842 ZhiyuLi-goog closed 1 week ago
0
Add dispatch and combine masks for dropping

#841 RissyRan closed 1 week ago
1
Mlperf/4.1 grain

#840 aireenmei opened 2 weeks ago
1
[mlperf/4.1] enable shard_in_read for large scaling training

#839 ZhiyuLi-goog closed 2 weeks ago
1
Llama3.1 (8B,70B) 🦙

#838 khatwanimohit opened 2 weeks ago
3
script to convert llama, mistral, mixtral checkpoints to huggingface format

#837 jwyang-google opened 2 weeks ago
0
[gcs-team] GCS Checkpointing benchmark feature updates

#836 MattIrv closed 2 weeks ago
1
Adds ragged attention.

#835 patemotter closed 1 week ago
0
Integrate Badput monitoring with MaxText

#834 dipannita08 opened 2 weeks ago
0
Add dropping strategy

#833 RissyRan closed 2 weeks ago
3
add kl divergence for forward_pass_logit_checker

#832 ZhaoyueCheng closed 2 weeks ago
1
Standalone checkpoint write seems to have memory leak

#831 bernardhan33 opened 2 weeks ago
0
Add support for local sliding window attention in TPU splash_attention

#830 gagika closed 2 weeks ago
0
converting Gemma maxtext compatible checkpoint to Hugging Face format

#829 salrowili opened 3 weeks ago
1
Report hyperparamters from the distributed training benchmark workload

#828 bernardhan33 closed 3 weeks ago
0
<Do not merge> Update and rename 1024b.sh to v5p-12288.sh

#827 Obliviour opened 3 weeks ago
0
Support AoT in 16-vm GPU Llama2 train script

#826 jonb377 closed 3 weeks ago
0
Removing the resgistration of the proxy backend used by Pathways.

#825 lukebaumann closed 3 weeks ago
0
Update NCCL flags for A3 Mega with the network release of 6/27.

#824 yangyuwei opened 3 weeks ago
0
[MLPerf][GPT3] Bypass setting eval_interval in using synthetic dataset

#823 ZhiyuLi-goog closed 3 weeks ago
0
Add instruction for Mixtral

#822 RissyRan closed 3 weeks ago
0
new features with distributed training framework

#821 bernardhan33 closed 3 weeks ago
0
chore: format the README table

#820 DemoYeti opened 3 weeks ago
0