madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

Improvements to complete PRs 979 and 980 #984

Closed valassi closed 1 month ago

valassi commented 1 month ago

Hi @oliviermattelaer these are my improvements to complete PRs #979 and #980.

If you agree with this, I can merge the whole lot.

This depends on https://github.com/mg5amcnlo/mg5amcnlo/pull/135. Please review that one first, or together with this.

Can you review please? THanks Andrea

oliviermattelaer commented 1 month ago

Hi Andrea,

I'm completely confused here. I know that I do not yet fully understand github and this is likely the issue here. How am I suppose to review this? What I need is to have a diff that compare your change to my latest change. If I keep the branch as master this is obviously not what I need. and If I put my branch as bases, it does not work either (why?)

Is their an issue in the way I created those branches (or in the way you handle those? --I guess the issue is on my side--)

I will try to create such diff manually but if you have a trick or an understanding why the diff in such PR are so bad? it could be usefull

oliviermattelaer commented 1 month ago

Ok even with the "diff" that I want, looks like you are merging something else at the same time. It seems that you have the CI fix for mac at the same time but also some change with the helicity, the timer,... Is this intended? Are you sure that those are required for #979 and #980? (I hope not) Could you try to create a new branch with only the required changed? And maybe create another PR for the additional stuff? (or ping me on some PR that I have to approve that does contain this)

Otherwise in launch_plugin.py, you replace cleanavx by cleanall Which means you force a recompilation of the full Source directory in top. Is this on purpose? You never mention such need. We can do it by security, but this sounds an overkill...

So I would propose to have

Now i would say that cleanall can be

cleanall: cleansource cleanavx

but I might miss a point here, so maybe they are an advantage to keep your current definition (even if adding cleanavx target)

valassi commented 1 month ago

Hi @oliviermattelaer I am sorry this is confusing

(This is exactly why I was saying that using submodules is nice but can be confusing).

There are two parts here, what I added on top of mg5amcnlo and what I added on top of madgraph4gpu.

(1) For mg5amcnlo, this is now into your https://github.com/mg5amcnlo/mg5amcnlo/pull/132

(There are still issues to solve there, but hopefully not in my part)

(2) For madgraph4gpu, let me explain.

I think the problem is that your testsuite_only_fixed did not include the latest master. Now I have modified that by merging master into it (well I included it into testsuite_only, and then the latter into testsuite_only_fixed).

So now that you modified this PR to target testsuite_only_fixed you should be able to see the differences.

The commits are

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> git log --oneline upstream/testsuite_only_fixed..HEAD
93ef4ba17 (HEAD -> gpucpp, origin/gpucpp) [gpucpp] upgrade mg5amcnlo from 185cc4c9c to fa706c696 (latest gpucpp_fixci): fix my bug>
57c72a475 Merge remote-tracking branch 'upstream/testsuite_only_fixed' into gpucpp
807f48173 [gpucpp] regenerate all processes - this completes the add-ons to PR #979 and #980
7e69c82dc [gpucpp] regenerate CODEGEN patch from gg_tt.mad (but actually nothing changes: I just want to mark that Source/makefile>
c79843a22 [gpucpp] in CODEGEN dummy change to patch.common to allow a meaningful commit message later (regenerating the patch chan>
5aa2b4400 [gpucpp] regenerate gg_tt.mad - so that I can regenerate patches from it
326ce7e7d [gpucpp] regenerate gq_ttq.mad - heck that all is ok
6ff187893 [gpucpp] in CODEGEN backport Source/makefile from gq_ttq.mad: add back cleanSource to speed up P* cleanup
44e4f7dd9 [gpucpp] upgrade mg5amcnlo from a3ccfd2bf to 185cc4c9c: in Source makefile, split clean into cleanSource+clean and move >
aad68d552 [gpucpp] in gq_ttq.mad Source/makefile, add back cleanSource to speed up P* cleanup; remove cleanavx which is identical >
6da711506 [gpucpp] in gq_ttq.mad and CODEGEN launch_plugin.py, use 'make cleanall' instead of 'make cleanavx' to clean the build (>
d28f85e69 [gpucpp] regenerate gq_ttq.mad using the current proposal for PR #980
a8ff34265 Merge remote-tracking branch 'upstream/testsuite_only_fixed' into gpucpp

And the diffs are here, excluding regenerated code

git diff upstream/testsuite_only_fixed..HEAD --name-only | egrep -v '(.mad/|.sa/)'
MG5aMC/mg5amcnlo
epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/launch_plugin.py
epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/madevent_makefile_source_addon
epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/output.py

In practice:

Otherwise this is essentially identical to your 980... Does this clarify?

If you agree, I would

NB: in https://github.com/mg5amcnlo/mg5amcnlo/pull/132 I think there are still some acceptance tests failing, but these are in your code/tests and I have no idea how to look at them...

valassi commented 1 month ago

Otherwise in launch_plugin.py, you replace cleanavx by cleanall Which means you force a recompilation of the full Source directory in top.

Sorry, my mistake again... I did not understand what you meant, now I see it

Before I changed it, the two targets were (this is current master) https://github.com/madgraph5/madgraph4gpu/blob/3f69b2669959d66d28509258853388ee8ac2bdc1/epochX/cudacpp/gg_tt.mad/Source/makefile#L139


cleanavx:
    for i in `ls -d ../SubProcesses/P*`; do cd $$i; make cleanavxs; cd -; done;
cleanall: cleanSource # THIS IS THE ONE
    for i in `ls -d ../SubProcesses/P*`; do cd $$i; make cleanavxs; cd -; done;

I thought they are identical, they are not.

The difference is that calling 'cleanall' ALSO cleans Source, while cleanavx does not. I do not remember why 'cleanavx' got there, I thought I had added it and forgotten it by mistake, but maybe you added it for this reason.

Anyway: the part that is really slow, both in the cleanup and in the build, is the P* subdirectories. Deleting and recreating the objects in Source is very fast compared to that. For extra safety I would actually remove those and recreate them. Or you would prefer to avoid that?

Is this on purpose? You never mention such need. We can do it by security, but this sounds an overkill...

Anyway, to be clear. It was not on purpose in the sense that I made a mistake and I had not noticed.

That said, now that I did notice, I actually think that it is a small overkill and I would keep that.

Rephrasing: are you sure yourself that in Source there are things that do not depend on vector.inc and need no recompilation when vector.inc changes? I see for instance that you added a dependency of libmodel on vector.inc, so I guess it must be recompiled?

(By the way, I noticed a strange error that I will report with 'vector.inc missing for lepton pdf', which is exactly related to vector.inc and libmodel).

valassi commented 1 month ago

Hi @oliviermattelaer as discussed offline, I merge this into your branch for #980.

Then I will merge #979 and #980 into master.

Thanks Andrea