Jorgen's makefiles with separate builds for CUDA, HIP and C++

valassi commented 8 months ago

Hi @hageboeck @roiser (and @oliviermattelaer) this is a WIP PR which takes Jorgen's PR #775 and updates it to upstream/master.

As discussed with Stephan and Stefan via email and partly in PR #797 and PR #753, I would suggest to split the refactoring of makefiles in two parts:

First we simply define the new targets (splitting cuda from cpp builds, and paving the way for HIP), and here we could add a few of Stephan's changes like using native as default (and maybe forcing the use of builddirs? maybe later): this is needed soon, it blocks HIP, it blocks adding SYCL like HIP etc. This is is this WIP PR.
Second, later, camly, we do the full refactoring in PR #753, meaning we move stuff over to make_opts and improve the passing of external flags. The problem I see now with PR #753 is that it has no support for .sa and it is very difficult to put it back: I would probably suggest to rethink it a bit. Essentially, cudacpp.mk and cudacpp_src.mk should continue IMO to be fully functional (in teh .sa dirs) when there is no fortran makefile. Note that these .sa are neded by @zeniheisser for reweighting IIUC.

What I have done so far is only to complete the merge of upstream/master into Jorgen's branch, for CODEGEN and for gg_tt.mad. There are still a few issues to solve (eg 'make' does nothing, 'make all' builds fortran but no cudacpp, 'make cuda' builds cudacpp but no fortran?).

But I already put it here as a basis for concrete discussion.

valassi commented 8 months ago

I have completed the integration of Jorgen's makefile changes, with separate builds for CUDA and C++.

As discussed, I think this is the first logical step in refactoring makefiles and can/should be done separately from other makefile changes. I slightly modified Jorgen's logic: rather than having separate makefile targets, the same default targets as before are used, the only difference being that the AVX variable is renamed BACKEND (as discussed with Stephan) and that this is the only driver of what happens in the build.

En passant I also included a make_opts codegen patch we have been discussing with Olivier to make it more reproducible. And the disabling of ccache for fortran discussed with Stephan.

All CI tests (for the old and new CI, for .mad and .sa) are now passing - which was the main problem in #753 and #797 why I suggested to split the refactoing in two parts. This is essentially complete, I will just rerun my usual tput and tmad tests before considering it merge-ready.

A few additional goodies discussed with Stephan may also go in, like using native march builds (for the moment I have an 'auto' which defaults back to the same 512y logic as before, but native may be a better alternative). Or using mixed as default instead of double. And there are probably a few more. But the bigger changes to makefiles treatment of variables I would push to a second step.

Let's discuss next year... Andrea

valassi commented 7 months ago

From https://github.com/madgraph5/madgraph4gpu/pull/801#issuecomment-1919383760 :

I have check the plugin/interface part and this is good to merge for me.

However, this would need to add a new backend within launch_plugin.py (for the moment, this only support Fortran, CPP, CUDA). Which means that a "normal" user can not run the code (even if all the script that we use for testing are working).

I would say that our target here would be to be able to run via the following script:
generate g g > t t~ g
output madevent_gpu PROC_TEST
launch
set cudacpp_backend HIP

Hi @oliviermattelaer I am about to merge #801 (thanks for reviewing!). I suggest to move here (or in another related PR your requested changes from the comment above. This is in any case one thing I had on my todo list here (though it seems I had not written it down in December). I think it nicely fits here because in this PR #798 I suggest to simply rename AVX as BACKEND, hence the discussion about one or two lines in the runcard that we had at our meeting https://indico.cern.ch/event/1355144/ (or actually at the earlier meeting on that same day, I do not remember exactly). I will make a concrete proposal here.

valassi commented 5 months ago

I have completed the merge of the latest upstream/master (including HIP support) into this WIP MR (with separate builds for CUDA and C++, and now also HIP).

I have backported to codegen and regenerated all code.

I must still port my tput/tmad scripts to the new infrastructure were cuda and c++ are completely separate (this will take some time), and relaunch all tests.

The CI tests have passed hoevwer, which is already good. I will make a slight CI improvement now

valassi commented 5 months ago

Hi @oliviermattelaer this is essentially complete, can you please review?

It closes #602. It closes #680. The main point here is having cleanly separate builds for c++ and cuda.

This starts from Jorgen's work on makefiles, plus various additions. It is merged with the latest master, where there was also Jorgen's work on adding HIP (these were two separate PRs). Now we have targets separately for cuda, hip, and the various cppnone, cppsse4, cppavx2, cpp512y, cpp512z. There are also cpp and cppauto which are meant to be the "best" (eventually one could move instead to "native" for this as @hageboeck was suggesting, which is a good idea).

There is a bit of cleanup of makefiles, in particular I removed my horrible hack to determine CUDACPP_BUILDDIR from the fortran makefile (see #829). Now this is determined in a common cudacpp_builddir.mk makefile that is included both from cudacpp.mk and the fortran makefile. Note instead that several other variables that cudacpp.mk must pass to cudacpp_src.mk are simply passed with export, since cudacpp_sr.mk is always called by cudacpp.mk (and this export could go a bit further, there is more stuff that can be removed from cudacpp_src.mk this way). I have not touched make_opts: it was, and still is, included by the fortran makefile, and I have not added or removed anything from it.

Concerning makefiles, as in the past, the same cudacpp set of makefiles is used for the .sa and .mad builds. In practice in .mad mode the fortran makefile delegates the build of the cudacpp library to the cudacpp makefiles (it simply calls those makefiles; the only common thing that is included are the cudacpp_builddir, some checks on the vality of backend and similar).

About #680, one practical advantage of this is that you only build what is needed and you build it once. In the past, building all 5 SIMD modes was also building 5 times the cuda libraries, wich was a waste.

(Note also, we no longer have builds which link both cuda and c++, which simplifies the handling of namespaces etc. But it's better to keep in mind that this may always come back, e.g. through fat binaries #177).

The CI tests all pass for .mad and .sa.

As usual, I am running my last tests, before being ready to merge.

While I was writing this, I realised that indeed the cudacpp_src.mk still needs some cleanup, also for separting the targets. I will put this in WIP again for the moment, but if you want you can start reviewing already. Thanks

cc @roiser

valassi commented 5 months ago

Hi @oliviermattelaer I have finally done also the latest changes to makefiles I had in mind (indeed there were some small issues here and there). Looks better and somewhat cleaner/shorter now. Now ready for review when you want. Thanks!

(The CI tests are good again. Will run my longer tests during the break).

valassi commented 5 months ago

Hi @oliviermattelaer again, I confirm that this is now ready to go, I finished all other tests and added some last tweaks. Thanks

valassi commented 5 months ago

I have now updated the PR after merging the latest master with SUSY PR #824 and SMEFT PR #632

valassi commented 3 months ago

Thanks OLivier. I will have a look at your comments.

In the meantime I have updated this after merging upstream/master which include PR #832 for HEFT/FPE

valassi commented 3 months ago

The previous CI was failing because of an issue in linking main() in runTest.exe. This should now be fixed in all processes.

I will look at Olivier's suggestions. And also try to make some names more consistent (eg runTest.exe is now confuding as it is only for cpp or for cuda now, this can be marked explicitly)

valassi commented 3 months ago

I have rerun all tests and all looks ok.

The only pending issues are Olivier's original two comments on madevent_interface.py and launch_plugin.py.

valassi commented 3 months ago

Just one small issue that modifying madevent_interface.py is (really) a bad idea since user might not always use the modified file depending on how they run the code (and/or environment variable), I'm actually not even sure that such change is needed. Can you check without? If you have trouble I can have a look obviously

Hi @oliviermattelaer thanks. As mentioned in the two comments, I have opened a new issue #844 (which is really a fragment of your own issue #756) about madevent_interface.py.

In my opinion, this is a problem but it should not be fixed here in #798, where I am addressing other problems. Rephrase: I would not block the merge of #798 because of that.

Anyway, I requested a new review from you - can you please have a look and approve or suggest something else to do?

Thanks! Andrea

valassi commented 3 months ago

(Note: I have added additional cleanup of makefiles in PR #841 - but let's first get this #798 out of the way)

valassi commented 3 months ago

I have updated github actions cache to v4 (fixing #848)

valassi commented 3 months ago

Hi @oliviermattelaer (I upgraded to the latest gpucpp and removed the special handling of G^2), do you have any further feedback on this?

Are you satisfied with my suggestion to move the issue with madevent_interface.py to another ticket, and go ahead merging this one? Thanks

oliviermattelaer commented 3 months ago

Perfect, thanks Andrea! You can go ahead and merge this,

Thanks a lot,

Olivier

valassi commented 3 months ago

Thanks Olivier! I am merging this then.

madgraph5 / madgraph4gpu

Jorgen's makefiles with separate builds for CUDA, HIP and C++ #798