Closed dakinggg closed 1 week ago
This PR includes a few changes for increased extendability of the code:
slice_attention_mask
MPTBlock
configuration_mpt.py
MPTModel
TrainConfig
Loss before and after:
@milocress the GPU test is unrelated. It will be fixed by the next composer release (which is why that test isn't marked as required yet)
This PR includes a few changes for increased extendability of the code:
slice_attention_mask
toMPTBlock
configuration_mpt.py
just for HF checkpointingMPTModel
TrainConfig
Loss before and after: