microsoft / triton-shared

Shared Middle-Layer for Triton Compilation
MIT License
165 stars 34 forks source link

ARM github workflow #66

Closed danikhan632 closed 8 months ago

danikhan632 commented 10 months ago

I have been working on triton-shared and have been using ubuntu-arm64, so I wanted to add a workflow to the repo that runs the ARM variant as potential optimizations can be made

A few issues with this current PR that would have to be addressed:

  1. Github doesn't have an arm64 workflow runner, so that’s an issue, in the integration test I'm self hosting but, it would be better if this was hosted This is going to be an issue for where this will run:
    
    build_and_test_triton_shared_arm:
    runs-on: [self-hosted] #going to need a self-hosted ubuntu-arm64 action runner

steps:

  1. I'm currently hosting the llvm tar bundle on my own GCP bucket. It's probably a good idea that I don’t self host the llvm tarball; should probably be hosted by someone else
rev = "49af6502"
name = f"llvm-{rev}-{system_suffix}"
if system_suffix == 'ubuntu-arm64': #this is probably not a good idea to merge, but it does work
url = "https://storage.googleapis.com/compiled-blob/llvm-49af6502-ubuntu-arm64.tar.gz"
else:
url = f"https://tritonlang.blob.core.windows.net/llvm-builds/{name}.tar.gz"
return Package("llvm", name, url, "LLVM_INCLUDE_DIRS", "LLVM_LIBRARY_DIR", "LLVM_SYSPATH")
  1. I had to do a bit of an odd work around to change the setup.py for triton, this is really hacky on my part but it seemed like a temporary solution for the runner until hopefully llvm binaries are built and integrated into the setup.py
- name: Build/Install Triton
run: |
export TRITON_CODEGEN_TRITON_SHARED=1
cd triton/python
python3 -m pip install --upgrade pip
python3 -m pip install cmake==3.24
python3 -m pip install ninja
python3 -m pip uninstall -y triton
rm setup.py 
mv ../third_party/triton_shared/.github/arm-setup.py ./setup.py
python3 setup.py build

Working on utilizing Arm MLIR dialects for faster GEMMs so would be appreciated if this could be integrated in some form

manbearian commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

danikhan632 commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

manbearian commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

manbearian commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,

First, ARM-hosted LLVM:

Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.

Second, regarding ARM64 runners:

i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

danikhan632 commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 ,

First, ARM-hosted LLVM:

Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.

Second, regarding ARM64 runners:

i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

danikhan632 commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

would love to sync up on this and hear more from your colleagues

danikhan632 commented 10 months ago

A

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 , First, ARM-hosted LLVM: Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work. Second, regarding ARM64 runners: i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?

      matrix:
        config:
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
        - {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
        - {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
        - {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}
manbearian commented 10 months ago

A

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

@danikhan632 , First, ARM-hosted LLVM: Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work. Second, regarding ARM64 runners: i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.

will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.

Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?

      matrix:
        config:
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
        - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
        - {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
        - {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
        - {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

manbearian commented 10 months ago

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.

danikhan632 commented 10 months ago

I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.

Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.

https://github.com/openai/triton/pull/2726

NathanielMcVicar commented 10 months ago

Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).

danikhan632 commented 10 months ago

Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).

Just commited again with the 1ES arm workflow runner (EDIT) also just fixed that mistake in the yaml so might need a re-run

danikhan632 commented 10 months ago

@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.

Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.

On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

manbearian commented 10 months ago

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

Hi @danikhan632 ,

First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.

Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?

danikhan632 commented 10 months ago

Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned

Hi @danikhan632 ,

First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.

Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?

No worries everyone is out for the holidays, so understandable. Right now the workflow kept failing because the self-hosted workflow that was setup doesn't have pip installed and I created a bit of an issue. Not sure about the version checkin-part, its up to date with current build.

aaronsm commented 8 months ago

Great to see some Arm64 builds :) Here's the workflow I'm using right now.

On MacOS, I follow the instructions on the Triton github for "Install from source" and I comment out the X86, NVPTX and AMDGPU libs in triton/CMakeLists.txt (https://github.com/openai/triton/issues/2922)

For Ubuntu 20.04 and 22.04, I do the same as MacOS and also change triton/python/setup.py to use a native Arm64 build of LLVM (https://github.com/openai/triton/issues/2921).

@danikhan632 shoot me an email aasmith@microsoft to talk more about Arm.