young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Apache License 2.0
2.33k stars 247 forks source link

ERROR: Accessing retired flag 'jax_enable_async_collective_offload' #109

Open LeoXinhaoLee opened 4 months ago

LeoXinhaoLee commented 4 months ago

Hi, thank you so much for releasing this wonderful codebase. When I'm trying to run pretrain_llama_7b on some v3-tpu pod, I got this error:

ERROR: Accessing retired flag 'jax_enable_async_collective_offload'

It seems related to the flag specified before launching the job:

export LIBTPU_INIT_ARGS='--xla_jf_spmd_threshold_for_windowed_einsum_mib=0 \
--xla_tpu_spmd_threshold_for_allgather_cse=10000 \
--xla_tpu_spmd_rewrite_einsum_with_reshape=true \
--xla_enable_async_all_gather=true \
--jax_enable_async_collective_offload=true \
--xla_tpu_enable_latency_hiding_scheduler=true TPU_MEGACORE=MEGACORE_DENSE'

I am wondering if these flags are necessary and if some could cause the error? Thank you very much for your time and help!

s-smits commented 1 week ago

Same problem here.