madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
30 stars 33 forks source link

extend testsuite CI (split codegen from build/test, execute tests for many fptypes, add tmad tests) #794

Closed valassi closed 2 months ago

valassi commented 10 months ago

This is a WIP PR for extending the CI testsuite.

I keep this in a PR so that the CI can run (I have disabled on:push triggers)

valassi commented 10 months ago

One thing TODO?

valassi commented 9 months ago

Another thing TODO

valassi commented 3 months ago

I have just merged upstream/master into this WIP branch.

TODO:

valassi commented 2 months ago

I have just merged upstream/master into this WIP branch.

TODO:

I realise that some of the stuff in this WIP branch must be removed: there was some testing of FPE tests as a separate option, but by now FPE handling is completely default with no environbment variables, so all this stuff must be removed.

I have again merged upstream/master. And I have now also removed the FPE specific stuff

valassi commented 2 months ago

I made a first attempt at separating caches for different PRs #799

valassi commented 2 months ago

I have implemented a first attempt at adding tmad tests in the CI #871

valassi commented 2 months ago

This is almost ready for review. The tmad tests (#871) are working and are providing very useful results (eg they show rotxxx crashes).

A couple of things to complete before considering this ready for review

The latest CI run gave these errors https://github.com/madgraph5/madgraph4gpu/actions/runs/9686490186 image

Most of these are rotxxx crashes Example https://github.com/madgraph5/madgraph4gpu/actions/runs/9686490186/job/26729084480#step:12:182 image

valassi commented 2 months ago

Hi @oliviermattelaer this is now ready, can you please review?

I have extended my new CI tests and in particular I added 'tmad' tests that compare xsec and lhe files in madevent.

Note: the current status as of this commit is that all tests pass https://github.com/madgraph5/madgraph4gpu/pull/794/commits/b89e09352a3fd6b611dca8ee54d4a66027e50605 https://github.com/madgraph5/madgraph4gpu/actions/runs/9694056395 But this is only because I have explicitly bypassed a few known issues: 9 rotxxx crashes #855 and 3 zero cross sections #826.

I will now reenable those tests, which means that the CI will explicitly fail on them. I think this is very useful as it allows us to see if any of the new changes we are devloping (like your 'fix_826' branch PR #852 or my volatile patches PR #857) fix some of these issues.

I would merge this with high priority. Thanks! Andrea

PS snapshot of completed tests (note, thanks to ccache build caches, the tests complete in 6 minutes, which is reasonable; note also that I fixed the number of events, so now vecsize used is 32 and I only use 32 events in madevent)

image

oliviermattelaer commented 2 months ago

Sure this can be merge then (but then if we allow test that does not pass, we should also add my new CI test but that is likely waiting your review)

valassi commented 2 months ago

Ok as mentioned I have reenabled the 12 failing tests (rotxxx and zero cross section). It is expected that there are 12 failing tests (until we fix them!) image

This is now ready to be merged, I would do this ASAP.

valassi commented 2 months ago

Sure this can be merge then (but then if we allow test that does not pass, we should also add my new CI test but that is likely waiting your review)

Thanks Olivier! Merging NOW.

Can you remind me which PR I should review about your CI please?