Explore adding additional compilation flags to physics for speedup of execution with Intel

grantfirl commented 1 year ago

Description

The following compiler options may speed up execution of physics: (for Intel) -fp-model=fast -fprotect-parens -fimf-precision=high

@dkokron in https://github.com/ufs-community/ccpp-physics/pull/76 found a significant speedup for the UGWP v0 scheme. Code managers decided it was best to not single out one scheme for such treatment, however, and would like to see the impact of using these flags on all physics, or at least an entire suite.

Solution

Add the given flags "higher up" in the build system hierarchy, perhaps controlled by an environment variable.

Alternatives (optional)

Add the flags in the ccpp/physics CMakeLists.txt

Related to (optional)

https://github.com/ufs-community/ccpp-physics/pull/76

dkokron commented 1 year ago

Let me know if there is anything I can do to help with this effort.

dkokron commented 1 year ago

I am working on optimizations to the spcvrtm() routine. The code changes I made in spcvrtm() benefit from using the compiler flags that are the subject of this issue.

I've put a lot of effort into coding and validating these optimizations. I am willing to put more effort into getting my work into mainline development and, more importantly, into production.

Is there a CCPP regression suite I should be using to test my changes? Please tell me how I can help ease the transition from my sandbox into mainline development? Dan

aerorahul commented 11 months ago

@yangfanglin @Qingfu-Liu Can someone please provide assistance in testing and implementing the optimizations identified by @dkokron @StevenEarle-NCO

dkokron commented 9 months ago

I'll ask the question again. Is there a CCPP regression suite I should be using to test my changes?

yangfanglin commented 9 months ago

@dkokron you can use https://github.com/NOAA-EMC/fv3atm/blob/a82381c0b751a15e5343de5078ef836b2c444c89/ccpp/suites/suite_FV3_GFS_v17_p8.xml to test.

@grantfirl should this compiler option optimization be pursued ? Otherwise, NCEP NCO would like to divert GDIT resources to other tasks. Thanks.

grantfirl commented 9 months ago

@dkokron you can use https://github.com/NOAA-EMC/fv3atm/blob/a82381c0b751a15e5343de5078ef836b2c444c89/ccpp/suites/suite_FV3_GFS_v17_p8.xml to test.

@grantfirl should this compiler option optimization be pursued ? Otherwise, NCEP NCO would like to divert GDIT resources to other tasks. Thanks.

According to the original PR (#76), the speedup was ~5% just for changing compilation flags for that one file. There certainly seems to be some benefit, so I'd personally like to see the same test repeated with the flags applied more broadly. I am not an expert in compiler optimization though. @SamuelTrahanNOAA @mkavulich @dustinswales Do you have any objections to @dkokron trying to apply the listed compilation flags for all physics? If the answer changes are akin to normal compiler changes, I really don't see the problem if there is a 5%+ speedup in model execution.

dkokron commented 9 months ago

I want to make clear that I limited the compiler changes to the spcvrtm() routine because that routine popped up near the top of the list when profiling a HAFS case. There were other physics routines that showed up, but those performance tall-poles are handled with other approachs and PRs.

It took several weeks effort to extract a unit test for spcvrtm(), then figure out a set of compiler options that provided a speedup AND didn't change the numerics of that routine "too" much. Do you have unit tests for the various physics packages?

How are we going to determine we are getting an acceptable answer when these compiler changes are applied more broadly?

SamuelTrahanNOAA commented 9 months ago

When making a change that could reduce the precision or accuracy of the results, you must have an objective means to quantize the impact on forecast skill. A 5% gain isn't enough to justify an unknown level of impact on the forecast skill.

A better solution would be to allow targeted optimizations for specific files. That way we can isolate risky optimizations to parts of the code where we know they're safe. We could even use more risky optimization options to get higher gains while limiting it to a region of code that won't be negatively affected by it.

In short, you're using the opposite approach of what you should be doing.

dkokron commented 9 months ago

@SamuelTrahanNOAA I don't want to apply the proposed compiler options more broadly. My original PR applied the change only to one specific file. That PR was rejected for being too specific! I'm trying to work with the code owners as best I can.

SamuelTrahanNOAA commented 9 months ago

@SamuelTrahanNOAA I don't want to apply the proposed compiler options more broadly. My original PR applied the change only to one specific file. That PR was rejected for being too specific! I'm trying to work with the code owners as best I can.

I know. I'm saying the code managers are taking the wrong approach and you took the right approach.

In other words, I'm disagreeing with this:

Code managers decided it was best to not single out one scheme for such treatment

But if they're willing to test a large number of models, this would be okay:

however, and would like to see the impact of using these flags on all physics, or at least an entire suite.

So long as it is not the default compilation option.

grantfirl commented 9 months ago

@SamuelTrahanNOAA @dkokron Perhaps you already have this context, but at one time, the CCPP physics had tighter control over compilation flags and had somewhat of a confusing mess of flags for individual files. I think for simplicity, UFS code managers decided to streamline the build system of all UFS components such that most of the compilation flags are controlled by the host. Remaining deviations in the physics from the flags passed in from the host are to reduce optimization for a couple of files for the Release mode.

Although I understand the argument of increasing optimization for specific files, it would be going back in the opposite direction, application- and maintenance-wise. I think that we're trying to understand the risks/benefits for applying the flags more broadly versus applying them precisely, as you're suggesting. Could you also help us to understand why the compilation flags proposed here are any more risky than other optimization flags that are routinely used? What metrics go into the decision of whether "code numerics change too much"?

grantfirl commented 9 months ago

@dkokron There are no unit tests for individual parameterizations or suites of them as far as I know. One could use the SCM, I suppose, to at least isolate changes to 1 column.

I'd ask the NOAA EMC folks what criteria they've used to accept optimization changes and/or compiler version changes in the past. I know that there was a butterfly test that has been used, but I don't know how it was implemented or whether it is still valid and being used.

SamuelTrahanNOAA commented 9 months ago

@dkokron There are no unit tests for individual parameterizations or suites of them as far as I know. One could use the SCM, I suppose, to at least isolate changes to 1 column.

I'd ask the NOAA EMC folks what criteria they've used to accept optimization changes and/or compiler version changes in the past. I know that there was a butterfly test that has been used, but I don't know how it was implemented or whether it is still valid and being used.

Last time we had a big change to the compiler options, it came from the same project (HAFS optimization). To test that, we added a -DFASTER=ON option which models tested for a few months before adopting.

If you want to change the optimization options again, that's what I suggest we do.

dkokron commented 9 months ago

@grantfirl The original PR describes the testing I did and the numerical results (contained in ugwp_results.txt). I felt that differences in the output fields of 10^-14 were reasonable. Different sets of flags gave differences of 10^-2 to 10^0 which I think are not acceptable. Those are personal choices. Does this answer your question?

yangfanglin commented 8 months ago

We have some side discussions ongoing between EMC and NCO. I'd like to add here one comment Jun Wang made:

Compiler options: as Steven mentioned, bit4bit reproducibility is critical for operational runs. We are very careful about the compiler options used in UFS because of this. Currently we use "fp-model=source/precise" to maintain the bit4bit reproducibility across UFS components. You can see the fp-model option impact (page 2) on reproducibility here:

https://www.intel.com/content/dam/develop/external/us/en/documents/pdf/fp-consistency-121918.pdf

ufs-community / ccpp-physics