lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
287 stars 94 forks source link

Sniff-testing ROCm build gh action workflow #1383

Closed dmcdougall closed 1 year ago

dmcdougall commented 1 year ago

Don't merge this yet, I'm still ironing out the mechanics.

mathiaswagner commented 1 year ago

Jenkins: Can one of the admins verify this patch?

mathiaswagner commented 1 year ago

Jenkins: Can one of the admins verify this patch?

dmcdougall commented 1 year ago

Can someone approve this so the workflow runs? I think I have a recipe sorted out now, and the action triggers on my fork. Just want to make sure it runs correctly here too.

dmcdougall commented 1 year ago

Nice, the build passed! Feedback welcome, but I'm happy for this to be merged now.

mathiaswagner commented 1 year ago

Question as I have not looked it up yet: Could we do target specific subdirectories in the workflows directory ? Like '.github/workflows/hip/...yaml' ?

dmcdougall commented 1 year ago

It seems Technically Doable but quite awkward.

A sensible compromise might be to simply come up with a naming convention for the filenames:

<target>-<id1>-<id2>-...-<idN>.yml

I'm open to other ideas.

dmcdougall commented 1 year ago

maybe change the name of your yml file, maybe to rocm-build-ci.yml.

Done.

I guess in the future we may have different targets or versions; we can decide that point whether to have different files for each (with a more descriptive filename) or to have all rocm builds done in this one file, etc.

Right now there's a limitation of one github workflow per yaml file, but changing the filename doesn't change the name of the workflow, so I realised I had to update both the name of the file and the name of the workflow in the file.

I agree with what you're saying. This is something we'll just have to iterate on as github actions features evolve.

maddyscientist commented 1 year ago

Maybe a DEVELOP build would be best. All warnings will be enabled so they can be seen in the CI logs m. A RELEASE build will hide all warnings which is not what we want for CI.

What are the warnings cause badness with STRICT build?

dmcdougall commented 1 year ago

So I'm little confused on the build types. I'm experimenting locally, and I'm expecting I'll probably need to open another issue to track some of these:

I'm trying a DEVELOP build now.

maddyscientist commented 1 year ago

Debug: no idea, I guess that’s for you to work out 🤷‍♀️

Develop, release and strict should be mostly equivalent in terms of code generation I think: strict has -Werror and it also explicitly disabled one warning on clang about lack of loop unrolling which is unavoidable it seems; release hides all warnings. With a strict build is there a compiler flag that can be passed to disable this header related warning?

dmcdougall commented 1 year ago

Maybe a DEVELOP build would be best

I tried this and realised it is a typo. The build type is DEVEL.

Debug: no idea, I guess that’s for you to work out 🤷‍♀️

Yes.

With a strict build is there a compiler flag that can be passed to disable this header related warning?

No. And it's not the only warning, it's just the first one I hit which caused the build to fail which means I didn't see any ones that may have come after the first one. A DEVEL build shows there's a litany of other compiler warnings related to failed loop unrolls, and various other things, but it does compile at least. This will give us something to base further changes off of. I'd like to be able to address all of those warnings.

Are you expecting a warning-free build for all of the red, blue, and green team architectures?

maddyscientist commented 1 year ago

For the clang loop unrolling warning, you can grab the incantation to disable it from the cuda cmake target file.

Are you expecting a warning-free build for all of the red, blue, and green team architectures?

Yes, that’s the expectation and what we have for green team. I accept that compiler warning disablement maybe needed for a given compiler if the issue can’t be dealt with in the code (eg clang loop unrolling).

maddyscientist commented 1 year ago

And sorry, I always mix up DEVEL build with the develop branch name 😖

dmcdougall commented 1 year ago

I accept that compiler warning disablement maybe needed for a given compiler if the issue can’t be dealt with in the code (eg clang loop unrolling).

Thanks. You beat me to the punch :)

Ok, I think I've addressed all your feedback here. Let me know if you have more comments.

maddyscientist commented 1 year ago

@Jenkins ok to test

jxy commented 1 year ago

Are you getting spams of my omp branch failing to bulid? How do I turn it off for the omp branch only?

dmcdougall commented 1 year ago

@maddyscientist Thanks for your help!

@jxy I haven't looked at your branch. Your build log is here: https://github.com/lattice/quda/actions/runs/5248198720/jobs/9479356861

jxy commented 1 year ago

@dmcdougall It's not the issue with HIP/ROCM. For now, I can't guarantee my branch won't break other backends. I would like to turn the github workflow off for the omp branch, so that it doesn't unnecessary do the build test. How do I turn it off for the specific branch only?