intel / llm-on-ray

Pretrain, finetune and serve LLMs on Intel platforms with Ray
Apache License 2.0
103 stars 30 forks source link

[Gaudi] Improve Gaudi workflow for self-hosted runner #268

Open Deegue opened 4 months ago

Deegue commented 4 months ago

Gentle ping @carsonwang for review. These changes are for our new self-hosted workflow, which is mainly handled by python instead of shell. So we have to wrap some shell code into python for better interaction.

carsonwang commented 4 months ago

Can you please run the CI tests on the Gaudi node for this PR. Let's make sure the the tests can pass.

Deegue commented 4 months ago

Can you please run the CI tests on the Gaudi node for this PR. Let's make sure the the tests can pass.

I think we should merge https://github.com/intel/llm-on-ray/pull/225 first.

carsonwang commented 4 months ago

Can you please run the CI tests on the Gaudi node for this PR. Let's make sure the the tests can pass.

I think we should merge #225 first.

You can add the changes in the PR to that one so tests can run.

Deegue commented 4 months ago

Can you please run the CI tests on the Gaudi node for this PR. Let's make sure the the tests can pass.

I think we should merge #225 first.

You can add the changes in the PR to that one so tests can run.

Can you please run the CI tests on the Gaudi node for this PR. Let's make sure the the tests can pass.

I think we should merge #225 first.

You can add the changes in the PR to that one so tests can run.

That PR passed CI and was validated successfully. I prefer separating two things into different PRs.

Deegue commented 4 months ago

image

The result was posted to https://github.com/intel/llm-on-ray/pull/225. It has been proved Gaudi runs successfully under these two PRs.