Change foundry docker images to use fork of TE that has prepare_te_modules_for_fsdp
Issues Fixed:
chuck-7b-starcoder2x-fp8-run1-ENdcdf fails with:
[rank0]: ImportError: cannot import name 'prepare_te_modules_for_fsdp' from 'transformer_engine.pytorch.distributed' (/usr/lib/python3/dist-packages/transformer_engine/pytorch/distributed.py)
when te_shard_weight: true
To fix, we need to use/pin a branch of TE that has this module in order to the 700 tflops numbers
Description
Change foundry docker images to use fork of TE that has
prepare_te_modules_for_fsdp
Issues Fixed:
chuck-7b-starcoder2x-fp8-run1-ENdcdf
fails with:when te_shard_weight: true
To fix, we need to use/pin a branch of TE that has this module in order to the 700 tflops numbers