su2code / SU2

SU2: An Open-Source Suite for Multiphysics Simulation and Design
https://su2code.github.io
Other
1.36k stars 843 forks source link

[WIP] Inconsistencies and improvements to SST model #2329

Open rois1995 opened 4 months ago

rois1995 commented 4 months ago

Proposed Changes

Hi everyone,

I have found some inconsistencies with respect to the literature on the implementation of the Menter's SST model. I would like to use this branch as test bench for any corrections/improvements made to the SST model.

Implementation errors found:

Changes to SST model proposed:

I've seen the proposed changes to the lower limits of k and w in #2323 and I tried implementing it in my branch. However, if the implementation proposed in the respective PR is preferred then I will change mine.

Related Work

2323 #1851

PR Checklist

Put an X by all that apply. You can fill this out after submitting the PR. If you have any questions, don't hesitate to ask! We want to help. These are a guide for you to know what the reviewers will be looking for in your contribution.

bigfooted commented 4 months ago

Great contribution, Thanks @rois1995 !

pcarruscag commented 4 months ago

If you are looking into robustness aspects too you should get in touch with @emaberman and @YairMo, seems like they have some good ideas and between the free time of 3 people a lot more can get done :)

YairMO commented 4 months ago

Hi,

Regarding the cross-diffusion term (CD) that appears in Omega source term (residual). The SST model (1994/2003) is a high-Reynolds-number model. Namely, It can not predict correctly the sub-layer region (especially the correct profile of the TKE). Therefore, only a positive contribution is required. Moreover, since the SST model was design as a k-w and k-epsilon blending, the CD term "belongs" only to the k-epsilon "branch", that is why the CD term include the factor "1-F1". However, it may happen, that the factor "1-F1" is not a 100% safe guarantee. It may happen that "1-F1" is not zero in region where the CD term is negative (this happen due numerical errors). To avoid such a situation, it is a good idea to clip the CD term with zero. Otherwise, severe numerical robustness issues may rise. Yes, it is different from Menter publications, but I think that clipping the CD term with zero is completely inline with Menter original idea (that is why, I think, he was including the factor "1-F1". But again the 1-F1 factor is not 100% percent "safe").

YairMO commented 3 months ago

Hi,

The use of an Omega production limiter (about the cross-diffusion term) is correct for low-Reynolds-number (LRN) models (the approach described by Peng et al. is very naive; there are other more rigorous treatments). For high-Reynolds-number (HRN) models, the clipping should be zero, keeping the cross-diffusion term positive; thus, the current implementation is correct.

Indeed, it is not exactly as it appears in Menter's original publication. The factor (1-F1) aimed to promise that the cross-diffusion term will be activated only outside the boundary layer, where it is positive (the cross-diffusion term switches its sign deep inside the boundary layer). This was also recognized by Peng et al. (first paragraph above Eq. 17). However, it may happen that the factor (1-F1)=1 where the cross-diffusion term is negative. Usually, it may happen at the wake, very near the airfoil trailing edge, where the upper and lower boundary layers merge. It is due to the imperfection of the F1 function.

To summarize, the current implementation is correct, and it is perfect for HRN models.

YairMO commented 3 months ago

For the sake of clarity, "current implementation" refers to the current treatment of the production code

emaberman commented 3 months ago

What YairMO is saying, is that allowing negative cross diffusion values is incorrect for high Reynolds models and should not be an option, this is a fix used for low Reynolds models only

rois1995 commented 3 months ago

Hi @YairMO, Hi @emaberman ,

thank you very much for your comments. I haven't found any suggestion in literature to clip to only positive values the cross-diffusion term in the w-equation. I understand that it might be more robust, but it is not the standard implementation of the SST model, which is the first thing that we need to achieve. Only then we can build on top of that to improve the robustness of SU2.

Nevertheless, I tried the SWBLI test case and I compared the results across 6 different combinations:

1- develop branch, no changes 2- develop branch, changes to Supersonic_inlet profile as suggested in #1851 3- my branch, with original CDkw implementation (should give exactly the same result as develop+modified BC) 3- my branch, with original CDkw implementation and using boundary conditions from TMR 4- my branch, with original CDkw implementation and using your suggestions for lower limits for k and w. 5- my branch, allowing negative values of CDkw 6- my branch, allowing negative values of CDkw and using boundary conditions from TMR 7- my branch, allowing negative values of CDkw and using your suggestions for lower limits for k and w. 8- my branch, allowing negative values of CDkw, using boundary conditions from TMR and using your suggestions for lower limits for k and w.

When my branch is used, then the changes to the supersonic inlet BC are already in place.

I haven't achieved convergence with 1, 2 and 3. More precisely, 1 diverged right away (after 30 iterations), while 2 and 3 gave "FGMRES - Orthogonalization Failed" after 900ish iterations.

Here you can see the residuals for the different combinations.

OrigCDkw

NegCDkw

Unfortunately I will be busy with the AIAA Conference next week, thus I don't know how much I will be able to work on this. The next test case will be the 2D airfoil near-wake from TMR.

YairMO commented 3 months ago

Hi rois1995,

First of all, enjoy your time in Las Vegas. Any paper that you are presenting?

As for our discussion about the cross-diffusion term, I've emailed the "source" (Menter). I believe he will make it clear. It may be that he will be able to answer only in a while ...

pcarruscag commented 3 months ago

In my experience, by-the-book implementation of SST is very unstable in real-world meshes and applications. For example there is a production limit, but it's proportional to k and w, and thus if those start growing for some reason there is nothing preventing the model from exploding. CD is also tricky due to the 1/w part...

YairMO commented 3 months ago

I totally agree. On real meshes, Omega usually drops to extremely low values. In cases where the cross-diffusion term is negative (allowed to be negative), it behaves as a sink term, further amplifying the drop of Omega. A simple addition of this term to the implicit diagonal is insufficient (I tried this). Other more rigorous methods are required (some available in the open literature).

My main argument is that the factor (1-F1) guarantees only the k-epsilon branch outside the boundary layer, in which the cross-diffusion term is already positive. It may not be so because of numerical errors.

YairMO commented 3 months ago

Regarding "Fixed Supersonic inlet BC inclusion of TKE," does the energy equation include the contribution of turbulent diffusion?

YairMO commented 3 months ago

Hi All,

We are lucky, Florian Menter just replied to me.

He agreed that the factor (1-F1) should activate the CD term only at the "k-epsilon" branch, where the CD term is already positive. Therefore, clipping with zero is in terms of the model and is not incorrect.

Me: A simple boundary layer simulation will reveal that the CD term switch sign is inside the boundary layer and that the F1 function switches from 1 to zero outside the boundary layer. Namely, the CD term should be positive. Florian is unaware of the occasions when it may be negative (even though the model was designed so that this term will be activated when it is positive).

Best wishes, Yair

rois1995 commented 3 months ago

Regarding "Fixed Supersonic inlet BC inclusion of TKE," does the energy equation include the contribution of turbulent diffusion?

Yeah, I do think so.

rois1995 commented 3 months ago

I have tested the tripping of the cross-diffusion term also in the residuals on a more complicated test case and indeed it improved convergence. Should I remove this change and use the value of cross-diffusion coming from the computation of F! blending function as was before?

YairMO commented 3 months ago

Hi,

May I understand "tripping" as "clipping"?

The cross-diffusion term (that appears in the residual of Omega) is better treated as follows:

2 (1-F1) sigma_w2 rho / Omega max(grad K * grad Omega, epsilon)

Where "epsilon" could be = 1.e-20 (personally, I'm using epsilon=0)

The term CDkw (that is used to calculate F1) should be implemented in a similar manner:

CDkw = 2 sigma_w2 rho / Omega max(grad K grad Omega, epsilon)

where epsilon = 1e-20 for the SST1994 and epsilon=1e-10 for the SST2003.

You mentioned that the SU2 implementation of CDkw is not as above?

rois1995 commented 3 months ago

Yeah, sorry, bad wording from my side. The standard version of SU2 used the value of CDkw as value of the cross-diffusion term in the residual computations. Thus, I think I should revert to that formulation.

Regarding the CDkw, it's implementation in SU2 was made such that the exponent of the epsilon term was equal to minus the production limiter constant of the model. Thus 20 for SST1994 and 10 for SST2003. Is it just a coincidence in the model formulation? In the original paper there is no mention of it being the same variable. This can lead to different results if someone changes the production limiter constant.

YairMO commented 3 months ago

Thank you. In the original paper of Menter (Ten Years of Industrial Experience with the SST Turbulence Model) about the SST2003), it is written that the power of epsilon should be minus 10. I think it is a coincidence. You are right; changing the production constant limiter may be problematic. It is better to distinguish between these two constants.

pcarruscag commented 3 months ago

I expect that epsilon to be a simple measure to avoid division by 0, if that lower bound had physical meaning it would have to be multiplied by some reference factors to make its dimensions appropriate, otherwise SST would not give the same results for the same Reynolds obtained with different rho and mu.

YairMO commented 3 months ago

You are correct! I have asked Menter about this many, many years back. I don't remember his response, but the formulation given in the TMR is correct (with this fault!)

rois1995 commented 3 months ago

I have two considerations on the new boundary conditions:

1- should the "domain length" in the Omega farfield BC be adimensionalized by the Reynolds length? Otherwise w gets adimensionalized only by the reference velocity. 2- the farfield value of TKE does not take into account for the freestream turbulence intensity. If we want to match the tke computed via $k{\infty}=w{\infty} (\mu_{\infty} ViscRatio)/\rho{\infty}$ and the one computed previously as $k{\infty}=(3/2)(U_{\infty}TI{\infty})^2$, we can either compute the viscosity ratio combining the two equations or just use the one which explicitly contains the $TI{\infty}$ disregarding the dependency on $w_{\infty}$. But can we do it? The model as written on the NASA TMR does not explicitly contains the turbulence intensity in the boundary conditions. Maybe it only matters when transition is taken into account.

YairMO commented 3 months ago

Hi,

I'm not sure what "new boundary conditions" means. Is someone replacing the present far-field BC of the TKE and Omega? I recognize these "new" BCs as the ones given on the TMR (proposed in the original paper).

I find the present setting in SU2 very comfortable and more "intuitive" than that given on the TMR.

rois1995 commented 2 months ago

So, the two boundary conditions currently implemented in this branch are:

Standard SU2 BCs $k{\infty} = (3.0/2.0) * (U{\infty} TI{\infty})^2$ $w{\infty} = \rho_{\infty} k{\infty} / (\mu{lam,\infty} * ViscRatio)$ $ViscRatio{default} = 10.0$ $TI{\infty, default} = 5.0$

NASA TMR BCs for SST1994 and SST 2003 model $U{\infty}/L \leq w{\infty} \leq 10 U{\infty}/L$ $10^{-5}U{\infty}^2/Re{L} \leq k{\infty} \leq 10^{-1}U{\infty}^2/Re{L}$

where "L is the approximate length of the computational domain," and the combination of the two farfield values should yield a freestream turbulent viscosity between $10^{-5}$ and $10^{-2}$ times freestream laminar viscosity, as cited on the NASA TMR page.

They give very different values of k and w at the farfield. For example, for a the standard test case of the SWBLI with $ViscRatio = 0.01$ and $TI_{\infty} = 5.0$:

Standard SU2 BCs $k{\infty} = 2574.55$ $m^2/s^2$ and $w{\infty} = 3e10$ $s^{-1}$

NASA TMR BCs $k{\infty} = 0.0013$ $m^2/s^2$ and $w{\infty} = 1563$ $s^{-1}$

I assume that in the end it doesn't matter since the farfield values drop significantly in the standard version of SST.

rois1995 commented 2 months ago

I am trying to run again the SWBLI V&V case on the SU2 page. However, as it is provided, it does not work with the SU2 standard boundary conditions. In particular, it always diverges due to NaN residuals in the X-Momentum equation. However, when I use the BCs from the NASA TMR page it works just fine (although only the v2003m model).

Here are the results of the computations performed.

Three configs are used: 1 - With dimensionless limits $k{lim} = 10^{-20}$ and $w{lim} = 10^{-10}$. 2 - Without dimensionless limiters. 3 - With dimensionless limits $k{lim} = 10^{-30}$ and $w{lim} = 10^{-20}$.

I am reporting the Skin Friction Coefficient on the lower wall against the Experimental results and the trend of the residual for $\rho$ during the simulations. Five different meshes have been considered.

ComparisonNewBCsResiduals

ComparisonNewBCsSFC

The solutions are pretty much identical for the same mesh level, but the cases with the config 3 achieved convergence across all of the meshes. It is interesting how, on finer meshes, the use of dimensionless limits (config 1) actually had worst convergence than the one with std limits (config 2) (up until the finest mesh, where config 2 diverges).

I am going to try to perform the simulations also with the non m variants of the SST model and the full production term. It may require some work since all of the occurrences of the "ComputeStressTensor" have to be changed. Ideally I think that it should be computed only once and then used that value for all of the other instances. I will focus only on the V2003 model for now as it is the only one that does not diverge.

rois1995 commented 2 months ago

@YairMO are these results something that you would expect to obtain when changing the lower limits of k and w? With the new updates to the Reynolds Stress computations I obtain mixed results:

ComparisonNewBCsResiduals

On the finest mesh the dimensionless limits diverge regardless of their value (I also tried decreasing them but it didn't work).

YairMO commented 2 months ago

Hi,

To begin with, thank you very much for your efforts and the tremendous work you put into improving the SST model!

It is very difficult for me to judge because I'm using a different time integration than the SU2, which may be critical in this regard. Moreover, my code is non-dimensional, so estimating the influence of the "newBC" and the "lower limits" is difficult for me. Can you tell the values of the "lower limits" relative to the freestream values? Please provide many details about the inputs in your config file (Linear solver, linear solver convergence, CFL, inviscid flux, etc')

YairMO commented 2 months ago

I found the SWBLI config file. I noticed that it is the NK solver. I would test this case without the NK solver. Moreover, I've noticed that the entropy fix is very low; how about raising it to 0.2 or using a different flux (JST, AUSMPLUSUP2)? It may sound irrelevant, but it is worth the try.

rois1995 commented 2 months ago

Changing the entropy fix coefficient (EFC) to 0.2 had the largest effects across the board. It actually degraded the convergence rate, but it removed divergence of the solvers for finer meshes.

ComparisonNewBCsResiduals_EF0 2

The converged solutions do not change when changing the EFC, which is desirable.

ComparisonNewBCsSol_EF0 2

YairMO commented 2 months ago

Great! This is a non-linear example: Can we say that the convergence pattern is not necessarily directly affected by the turbulence model's lower limits? Did you try avoiding the NK solver?

rois1995 commented 2 months ago

I don't think we can actually say that, or, at least, we can say that it is not the only factor affecting the convergence. The problem is that it is not consistent across the meshes: for sure, using dimensionless limits help, but changing their values has different effects for finer meshes.

ComparisonNewBCsResiduals_EF0 2_Other

The effect on the residual of the turbulent variables is the same. The most problematic one is always the Omega equation where the residuals remain always large.

ComparisonNewBCsResiduals_EF0 2_Other_TKE

ComparisonNewBCsResiduals_EF0 2_Other_Omega

The problem is that Omega is too large with respect to all the other variables, especially near the wall (where the residuals remain high).

YairMO commented 2 months ago

I agree. The NK solver is not well suited to solving the turbulence model. Maybe the linear solver alone will be more robust.?

pcarruscag commented 2 months ago

The NK solver is only applied to the flow equations.

YairMO commented 2 months ago

Thank you. How about our lower limits (committed):

K_lowLimit = 1E-15 K_FarField Omega_lowLimit = 1E-5 Omega_FarField

rois1995 commented 2 months ago

It changes somehow the convergence, but again, it doesn't always have the same effect.

ComparisonNewBCsRho_EF0 2

ComparisonNewBCsTKE_EF0 2

My problem right now is that if I run the v2003m config on the finest mesh with 40 cores then it diverges after 125 iterations. If I run with 20 cores then it does not diverge. The number of cells/core is more then 10k thus I don't think that the domain is overly partitioned.

pcarruscag commented 2 months ago

If you are using multigrid, it is known to be sensitive to partitioning. You could always use the hybrid parallel mode, that lets you use more cores with less partitioning.

rois1995 commented 2 months ago

Currently I am not using multigrid. I tried with with 20 cores and 2 threads and it almost results in the same rate of convergence of the simulation with only 20 cores. I also tried going back to a couple of commits, right after the merge of the develop branch and it does not diverge on 40 cores. Starting from that commit I only slightly changed the computation of the stress tensor, which does not involve new mpi communications, thus I don't understand why it should lead to divergence. I will look further into this.

EDIT.

I encounter non-deterministic convergence running the same testcase on the same machine consecutive times. Sometimes it diverges and other times it goes through all of the iterations. It may be related to Issue #2259. I tried with a combination of 4C10T and 20C2T (C = cores, T = threads).

rois1995 commented 2 months ago

I have tried running 7 consecutive and identical simulations with 4C10T and the residuals are not consistent. Moreover, Run 2 and 6 diverged respectively after 251 and 430 simulations. Here I am showing the residuals of the TKE (I know that NK does not apply to the turbulence solver but it is cleaner to visualize than the residual for the density)

NonDeterministicRMSk

I also tried going back a few commits right before changing the Reynolds stress computation and the situation is even worse.

NonDeterministicRMSk_PreviousCommit

Also tried with the develop branch and same non-consistency, even though every run here has diverged.

Here i summarize the convergence of these runs, 1e4 iterations means that it has "converged":

Run Iterations Iterations (old commit) Iterations (develop)
1 1e4 1e4 41
2 251 1193 34
3 1e4 1286 51
4 1e4 95 47
5 1e4 458 53
6 430 1e4 54
7 1e4 254 84
pcarruscag commented 2 months ago

Are you trying to reproduce these results https://su2code.github.io/vandv/swbli/? or using your own meshes?

rois1995 commented 2 months ago

I am trying with the .geo file used to create the meshes in the V&V folder. However, I am trying with a finer mesh, with a mesh sizing factor of f = 3.9762. I also tried the develop branch and, even though it always diverges, also there the iterations done are different. I have updated the previous comment.

With the coarser meshes I might have the same problems, but convergence is easier thus I think that I might need more trials. I'll start running some simulation with the 3rd level grid.

pcarruscag commented 2 months ago

Are you initializing from coarser results? I just checked that the L3 mesh still works fine in develop initialized from L2 results. The NK solver has some non determinism when using threads especially if the convergence is rough. Using a lot of cores also makes the ILU preconditioner less effective, so you may need to tweak linear solver iterations or CFL to compensate. For reference when I ran the case originally I had a 4 core laptop xD

rois1995 commented 2 months ago

Every simulation starts from zero. I also tried the third grid level and these are the results after seven runs

NonDeterministicRMSk_Level3

Runs 2, 6 and 7 diverge at different iterations while Run 4 converges to the desired residual value (log(RMS(Res_rho)) < -11.5). Running on 4 cores and 10 threads is a problem in this case? Or has the same problems of using 40 cores?

pcarruscag commented 2 months ago

The ILU issue depends on number of cores regardless of MPI/OpenMP split. The non-determinism of the NK solver should be worse with more threads. I recommend initializing from the coarse results.

rois1995 commented 2 months ago

I tried a couple of things:

No Newton-Krylov NonDeterministicRMSk_NoNK

No Newton-Krylov Zoomed NonDeterministicRMSk_NoNKZoom

The peak in the residuals that you can see around 3k iterations is different for each run, and leads to slightly different residuals after 10k iterations.

I will try having a look into the atomic operations.

pcarruscag commented 2 months ago

There are two sources for those, atomicAdd which is used in dot products with CSysVectors (so it affects the linear solvers). And in the NK solver to compute the norm of the solution. The other less obvious source is SU2_OMP_CRITICAL, critical sections do not have a guaranteed ordering. We use these to compute the RMS residuals, and those influence the CFL adaptation. AFAIK the only way around this is to store the result of each thread into a vector visible by all threads, and then have a single thread compute the total.

rois1995 commented 2 months ago

I can try having a look into this, but I am not sure of how much help I can be. Just from a quick look at the code I can see that the CSYSVEC_PARFOR has a NOWAIT instuction, whereas the SU2_OMP_FOR_STAT doesn't, and these are used right before the atomicAdd call respectively in the CSysVector and in the CNewtonIntegration. But I doubt that's the main cause of large discrepancies when using NK.

pcarruscag commented 2 months ago

I think NK is just more sensitive because the linear system uses a nastier Jacobian matrix.

rois1995 commented 2 months ago

Then it's kind of difficult to evaluate the improvements of this branch. When comparing with the develop branch, the simulations with NK at least do not straight up diverge. When not using the NK solver, the residuals kind of improve, mostly when OMP is not used. These are the results on the mesh level 3.

MyBranchVsDevelop_ResRho

MyBranchVsDevelop_ResTKE

MyBranchVsDevelop_ResW

MyBranchVsDevelop_LogAVGCFL

There are some spikes in the residuals, which strangely enough coincide for rho and w but have opposite direction.

When OMP is not used, the develop branch stalls the turbulence residuals, whereas this branch keep on decreasing them.

However, changing the number of cores used, the develop branch improves, while this branch does not.

MyBranchVsDevelop_ResRho_Altri

MyBranchVsDevelop_ResTKE_Altri

The main differences between the develop and this branch are:

rois1995 commented 2 months ago

Having a better look into the comparison between this branch and the develop one for the simulation with 10C and 1T.

When looking at the solution, the residuals actually improve with this branch everywhere except at the bottom symmetry BC. Don't really know why, I am checking all of the changes to check.

ComparisonAbsResRho_MyBranchVsDevelop

And, even more, the Mach number seems to be wrong on the bottom symmetry, but not on the upper one.

UPDATE:

The correction to the Supersonic_Inlet_BC works just fine, but some how it screws up the Mach on the symmetry (and I do not why bu only at the bottom one). It might be related to #1851, thus I am looking into including the tke in the computation of the sound speed.

rois1995 commented 2 months ago

I have started implementing the computation of the speed of sound with the tke. Just to be sure that I am following the right path, I have pushed some changes. With these changes the linear solver diverges after 39 inner iterations, don't really know why. Any support is welcome.

rois1995 commented 2 months ago

As of now the problem is that the supersonic inlet has the correct values of Mach number (5) but on the symmetry BC this does not occur. It seems that on the symmetry BC the problem is the same as it was for the supersonic inlet before, but I still have to find a way to fix this. Moreover, this only happens to the lower symmetry BC, whereas at the top one the Mach is correct.

YairMO commented 2 months ago

Maybe I'm mixing it up, but I remember there was an issue with the wrong treatment (I'm not sure exactly what it was, perhaps gradient treatment) at Symmetry BC.