madgraph5 / madgraph4gpu

GPU development for the Madgraph5_aMC@NLO event generator software package
29 stars 33 forks source link

add support for ROCRAND (via HIPRAND) #809

Closed valassi closed 5 months ago

valassi commented 6 months ago

As suggested by @roiser in PR #801, this is a WIP PR for adding rocrand (I think in the end it will be rocrand actually) support.

Very much WIP, more than the implementation itself there was a lot of logic in env variables and switches to sort out, I focused on that first

valassi commented 6 months ago

NB this is based on PR #801, it is just a few additions on top of that, but I would keep them separate

valassi commented 6 months ago

The status at this point is that

valassi commented 5 months ago

The status at this point is that

* the code builds

* gcheck.exe runs in rocranddevice, but produces nan MEs

This is fixed

* gcheck.exe and check.exe fail in rocrandhost

This I have now understood as being simply not supported yet. I checked and I always get a status 1000, not implemented. This is not surprising, see https://github.com/ROCm/hipRAND/blob/be4f5a97d55b6a38733a7a66970d13aef6c8944f/library/src/amd_detail/hiprand.cpp#L103

hiprandStatus_t HIPRANDAPI hiprandCreateGeneratorHost(hiprandGenerator_t* generator,
                                                      hiprandRngType_t    rng_type)
{
    (void)generator;
    (void)rng_type;
    return HIPRAND_STATUS_NOT_IMPLEMENTED;
}

Also see issue 76 in https://github.com/ROCm/hipRAND/issues. This is actively being worked on in Dec 2023.

I would do the following:

In any case none of this is relevant yet to Madgraph production (we are still at Fortran ranmar). We can wait for the hiprand host api...

valassi commented 5 months ago

This is almost ready for review

I am now rerunning the final battery of tests, this will be the next one on the list

valassi commented 5 months ago

This was now complete and fully tested, before the merge of PR #368.

However, the merge of PR #368 introduces many not completely obvious conflicts, so I leave this in WIP. I willl fix those conflicts and make this ready for review.

valassi commented 5 months ago

Hi @oliviermattelaer this is a PR to add support for Hiprand on AMD GPUs as suggested by @roiser (thanks for the suggestion).

This is now ready for review - can you please review and approve @oliviermattelaer ?

I am still running a full battery of tests, but in principle on some small scale tests everything looks good.

Note, as mentioned above:

I would go ahead and merge this rather than waiting for the full support, which can come later. The biggest part of the work is done (and involves quite a few code changes to make curand-specific stuff a bit more general). Enabling hiprand host and adding hiprand ordering when available will be a matter of 10 lines of code I think.

Thanks Andrea

valassi commented 5 months ago

I have merged the latest master (including merged PR #796) into this, and checked that all is ok.

@oliviermattelaer has already approved (thanks!)

@roiser do you have any feedback and/or can this be merged? thanks

oliviermattelaer commented 5 months ago

@roiser just to ping you such that you can approve this such that we can move forward on it too

roiser commented 5 months ago

quick code inspection, this also looks fine, again didn't look at the makefile changes but sure ok

Really minor comment and I propose not to revert this, but there is no need to bump up the copyright year numbers, see e.g. blog post https://matija.suklje.name/how-and-why-to-properly-write-copyright-statements-in-your-code (got the link from the CERN open source policy office OSPO)

valassi commented 5 months ago

quick code inspection, this also looks fine, again didn't look at the makefile changes but sure ok

thanks @roiser for the approval, I will merge now

Really minor comment and I propose not to revert this, but there is no need to bump up the copyright year numbers, see e.g. blog post https://matija.suklje.name/how-and-why-to-properly-write-copyright-statements-in-your-code (got the link from the CERN open source policy office OSPO)

interesting point... i was actually going to propose that we update all copyright statements to be 2020-2024 in the code, now I am a bit less sure after reading your blog post

i must say i still have a preference for having all copyright statements updated to 2020-2024: i mean that post suggests you do not need to update the years, but it is not saying that you should absolutely not do that... I find it visually more clear on the one hand, and on the other hand it IS true that we are continually updating the code... and in any case yes 50-70 years are a long time, so it makes a little difference...

I think I would especialy like a self consistent practice throughout the code: for instance, we put 2020-2024 everywhere, or we put 2020 (or year of first publication if >2020) everywhere... thoughts?

anyway, I am merging this

valassi commented 5 months ago

PS I moved the copyright discussion to issue #817