intel / ai-containers

This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow and PyTorch that have been optimized for Intel platforms. Scaling is done with python, Docker, kubernetes, kubeflow, cnvrg.io, Helm, and other container orchestration frameworks for use in the cloud and on-premise
https://intel.github.io/ai-containers/
Apache License 2.0
19 stars 15 forks source link

Update HF LLM fine tuning workflow to remove SSH setup and update PyTorchJob to not overwrite the container entrypoint #203

Closed dmsuehir closed 5 days ago

dmsuehir commented 2 weeks ago

Description

SSH setup has been moved to the multinode base, so that can be removed from the workflow Dockerfile. Also, in order for the torch ccl setvars.sh to apply, I've switched the k8s PyTorchJob to not overwrite the entrypoint command.

Changes Made

Validation

I tested this with Llama 2 and the the medical meadow flashcard dataset to verify training/eval using the CCL backend with a base container from Sharvil and a rebuilt workflow container.

dmsuehir commented 2 weeks ago

@tylertitsworth Reposted from a branch

github-actions[bot] commented 2 weeks ago

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails

Scanned Manifest Files

github-advanced-security[bot] commented 2 weeks ago

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

dmsuehir commented 5 days ago

Opened PR 238