microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.9k stars 4.06k forks source link

Update xpu-max1100.yml with new config and add some tests #5668

Closed Liangliang-Ma closed 1 month ago

Liangliang-Ma commented 3 months ago

This PR: 1.Change the container 2.Update the software version (align with docker compiler)

  1. Add some tests
loadams commented 2 months ago

Hi @Liangliang-Ma - could you resolve the merge conflicts so we can re-run and work on getting this merged?

Liangliang-Ma commented 2 months ago

@loadams Thanks for reminding me. Didn't notice that confliction.

Liangliang-Ma commented 2 months ago

@loadams Hi, the nv CI seems crash uncommonly. Could you help to check about it? Thanks.

loadams commented 2 months ago

@loadams Hi, the nv CI seems crash uncommonly. Could you help to check about it? Thanks.

@Liangliang-Ma yes, we are aware and should be fixing this. Since this isn't impacting this PR we can merge this as well. Just give me some time and I'll get to this

loadams commented 1 month ago

@loadams Hi, the nv CI seems crash uncommonly. Could you help to check about it? Thanks.

@Liangliang-Ma yes, we are aware and should be fixing this. Since this isn't impacting this PR we can merge this as well. Just give me some time and I'll get to this

@Liangliang-Ma - this should be stable now, so we will merge this PR. Sorry this took so long.