google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
33 stars 14 forks source link

Fixed exhausted bug between head and workers #163

Closed FanhaiLu1 closed 1 month ago

FanhaiLu1 commented 1 month ago

Recent xla2 change call jax.devices() in init state, all the TPU been used by head, it caused all the worker throw below errors:

RuntimeError: Unable to initialize backend 'tpu': ABORTED: The TPU is already in use by process with pid ..

I submitted https://github.com/pytorch/xla/pull/7769 to fix the xla2 initialization issue. This PR applied the xla2 fix and updated the readme.