Closed danikhan632 closed 8 months ago
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.
On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
@danikhan632 ,
First, ARM-hosted LLVM:
Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.
Second, regarding ARM64 runners:
i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
@danikhan632 ,
First, ARM-hosted LLVM:
Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work.
Second, regarding ARM64 runners:
i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.
will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.
On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.
would love to sync up on this and hear more from your colleagues
A
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
@danikhan632 , First, ARM-hosted LLVM: Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work. Second, regarding ARM64 runners: i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.
will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.
Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?
matrix:
config:
- {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'}
- {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work
- {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'}
- {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'}
- {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}
A
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
@danikhan632 , First, ARM-hosted LLVM: Please take a look at triton/.github/workflows/llvm-build.yml at main · openai/triton. If you add ARM64 for Ubuntu to the builds here, then a small change to the default setup.py script should get things working. Do you mind giving that a try? I talked to Phil earlier today and he was onboard with adding ARM support if we can make it work. Second, regarding ARM64 runners: i'm working on getting some ARM64 VMs setup under my teams Azure account and will make them available to this triton-shared Github project when they're ready. Not sure how long this will take, but hopefully not more than a few days.
will give a shot and submit a PR, changing the llvm build workflow to add ARM64 should be pretty easy. The setup.py should also be even easier. Will have this PR done pretty quickly. Thanks for the workflow runners, looking forward to those arm workflow runners.
Also wanted to ask if cross-compiling arm64 for the LLVM workflow is a good idea or if the workflow runner should just run on ubuntu arm64 to avoid any cross-compiling issues?
matrix: config: - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'x64'} - {runner: 'Ubuntu 20.04', runs_on: 'ubuntu-20.04', target-os: 'ubuntu', arch: 'arm64'} #should work - {runner: 'CentOS 7', runs_on: ['self-hosted', 'CPU'], target-os: 'centos', arch: 'x64'} - {runner: 'MacOS X64', runs_on: 'macos-12', target-os: 'macos', arch: 'x64'} - {runner: 'MacOS ARM64', runs_on: 'macos-12', target-os: 'macos', arch: 'arm64'}
I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.
I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.
Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.
I believe cross compiling is the way to go, as that's what we're doing on macos from what i could figure out.
Also, i think you'll need changes below as the following steps are conditional and i believe ubuntu-arm64 won't match any of them as is.
Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).
Hi @danikhan632 , this is great to see! Please take a look at #71 and see if you can get your pipelines working on that pool. Once you do, feel free to close that PR (@manbearian will have to manually delete the workflow it created I believe, but for now it's worth keeping up for testing).
Just commited again with the 1ES arm workflow runner (EDIT) also just fixed that mistake in the yaml so might need a re-run
@danikhan632 this is really amazing to see. Nice work! I'll work with my folks to figure out we can support ARM in our workflows.
Thank you so much, Also wanted to ask if pursing using Arm Neon in matmul and other arithmetic operations a good idea; was hoping to add this in for the optimization passes for triton cpu.
On this, i believe one of my colleagues has actually been looking into this with ARM. He offered to sync with you on this, so stand by.
Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned
Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned
Hi @danikhan632 ,
First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.
Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?
Hey, hope you had a good weekend, just wanted to ask if there was a way trigger the workflows manually just a bit tedious. Also I have some very rudimentary code for lowering linalg.matmul to ArmSVE/arm dialects and wanted to get in contact with that colleague you mentioned
Hi @danikhan632 ,
First, there's convergence of two things impacting the MS teams bandwidth right now: December vacations + Big internal presentation this week. So please bear with us.
Second, i'm sorry about the annoyance around running the workflow. Since the PR is updating the workflow this requires extra permissions to run after you do this. How much are you changing the workflow each submission? Can we get a version check-in that you can use to test?
No worries everyone is out for the holidays, so understandable. Right now the workflow kept failing because the self-hosted workflow that was setup doesn't have pip installed and I created a bit of an issue. Not sure about the version checkin-part, its up to date with current build.
Great to see some Arm64 builds :) Here's the workflow I'm using right now.
On MacOS, I follow the instructions on the Triton github for "Install from source" and I comment out the X86, NVPTX and AMDGPU libs in triton/CMakeLists.txt (https://github.com/openai/triton/issues/2922)
For Ubuntu 20.04 and 22.04, I do the same as MacOS and also change triton/python/setup.py to use a native Arm64 build of LLVM (https://github.com/openai/triton/issues/2921).
@danikhan632 shoot me an email aasmith@microsoft to talk more about Arm.
I have been working on triton-shared and have been using ubuntu-arm64, so I wanted to add a workflow to the repo that runs the ARM variant as potential optimizations can be made
A few issues with this current PR that would have to be addressed:
steps:
Working on utilizing Arm MLIR dialects for faster GEMMs so would be appreciated if this could be integrated in some form