Open brittonsmith opened 6 years ago
Original comment by Britton Smith (Bitbucket: brittonsmith, GitHub: brittonsmith)
This is happening because at least one particle is reaching the iteration limit for subcycling. For some reason, the line that prints the message about the iteration limit is being written to stdout instead of stderr, so it's probably getting lost.
In any case, this is most certainly happening for a high density particle where the timestep is being limited by a short n_e / (d(n_e) / dt) or n_HI / (d(n_HI) /dt). The solver will compute H values under equilibrium for number densities above 1e8 cm^-3, but probably the densities aren't quite that high yet.
The easiest way to get around this would be to increase the value of itmax
in solve_rate_cool_g.F, which would slow things down a bit, but would probably work as long as the densities don't get much higher.
Would falling back to computing equilibrium H whenever itmax
is exceeded work here, rather than only when the number densities become large enough? I've been seeing these same errors in a few of my simulations, and my big worry is that the cells are being left in a weird intermediate thermal state (or, worse yet, having temperatures set to a floor or something similar).
[ This description is from issues happening in March 2020 together with Ben Keller]
In cosmological zoom-in simulations that allow gas to become cold (~30K) and dense (nH > 1e4H/cc), we found that some cells were triggering the MULTI_COOL error in GRACKLE. It used to happen that the MULTI_COOL warnings would get triggered when it used more than 10000 subcycles. We found that when gas was close to equilibrium, and at high density, it could get into an unstable/oscillatory mode (see the attached figure for the internal energy over the last 500 subcycles). Because the subcycles were set to a timestep that it was no more than 10% of the cooling time, what it ended up doing was taking super short steps that kept oscillating around the equilibrium. So in this case, using more subcycles wouldn't help, since it would just keep oscillating forever.
We are facing the same issue, also for runs with a high resolution (nH>1e4 acc and low temperature). @bwkeller @brittonsmith @mreinacampos How did you managed to solve the problem ?
I don’t know if this will help, but here is one quick possible fix.
Assuming you are solving the chemistry and energy with Grackle, there is some code in solve_rate_cool_g.F to detect this equilibrium condition:
! If the net rate is almost perfectly balanced then set ! it to zero (since it is zero to available precision)
if (min(abs(k1(i)* de(i,j,k)*HI(i,j,k)),
& abs(k2(i)*HII(i,j,k)*de(i,j,k)))/
& max(abs(dedot(i)),abs(HIdot(i))) .gt.
& 1.0e6_DKIND) then
dedot(i) = tiny8
HIdot(i) = tiny8
endif
Currently, this just sets the appropriate chemical rates to zero, but it could also be used to set the energy rate of change to zero as well:
! If the net rate is almost perfectly balanced then set ! it to zero (since it is zero to available precision)
if (min(abs(k1(i)* de(i,j,k)*HI(i,j,k)),
& abs(k2(i)*HII(i,j,k)*de(i,j,k)))/
& max(abs(dedot(i)),abs(HIdot(i))) .gt.
& 1.0e6_DKIND) then
dedot(i) = tiny8
HIdot(i) = tiny8
edot(i) = tiny8
endif
I have never tried this so I don’t know if it works.
Cheers, Greg
On May 27, 2024, at 8:29 AM, yrevaz @.***> wrote:
We are facing the same issue, also for runs with a high resolution (nH>1e4 acc and low temperature). @bwkeller https://github.com/bwkeller @brittonsmith https://github.com/brittonsmith @mreinacampos https://github.com/mreinacampos How did you managed to solve the problem ?
— Reply to this email directly, view it on GitHub https://github.com/grackle-project/grackle/issues/19#issuecomment-2133381026, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQBD6BHUCH5AE5NFN7HNCLZEMRJRAVCNFSM4FPOMIP2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJTGMZTQMJQGI3A. You are receiving this because you are subscribed to this thread.
Greg Bryan Professor of Astronomy, Columbia University
@gregbryan Thanks for the suggestion ! We tried the fix but it didn't solve the issue. We are working now on better understanding when precisely the problem is triggered. Surprisingly its not easy for us to reproduce it outside our N-body code with a simple setup.
Sorry to hear it didn’t work. There are some ifdefs in that routine which output lots of debugging information when the iteration count is high — this can be a helpful way to understand what the issue is. Happy to look at and help parse that output if you can generate it.
Greg
On May 30, 2024, at 3:15 AM, yrevaz @.***> wrote:
@gregbryan https://github.com/gregbryan Thanks for the suggestion ! We tried the fix but it didn't solve the issue. We are working now on better understanding when precisely the problem is triggered. Surprisingly its not easy for us to reproduce it outside our N-body code with a simple setup.
— Reply to this email directly, view it on GitHub https://github.com/grackle-project/grackle/issues/19#issuecomment-2138838823, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQBD6EZ2ECROSF4YLO23CLZE3GXTAVCNFSM4FPOMIP2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJTHA4DGOBYGIZQ. You are receiving this because you were mentioned.
@gregbryan Thanks for offering your help! In the meantime, we managed to understand the origin of the problem. The default cloudy tables (eg. CloudyData_UVB=HM2012.h5) that provides the metals cooling/heating are limited to 1e4acc. When reaching higher densities (which is our case), the extrapolation of those tables leads to extremely large values of edot, causing the oscillation. So, nothing wrong in the code.
Great to hear you tracked that down. So I guess the solutions are to either extend the tables, or prevent the extrapolation. If you do work out a solution, please do submit it back to the project as a pull-request (to help others).
Thanks, Greg
On May 31, 2024, at 3:58 AM, yrevaz @.***> wrote:
@gregbryan https://github.com/gregbryan Thanks for offering your help! In the meantime, we managed to understand the origin of the problem. The default cloudy tables (eg. CloudyData_UVB=HM2012.h5) that provides the metals cooling/heating are limited to 1e4acc. When reaching higher densities (which is our case), the extrapolation of those tables leads to extremely large values of edot, causing the oscillation. So, nothing wrong in the code.
— Reply to this email directly, view it on GitHub https://github.com/grackle-project/grackle/issues/19#issuecomment-2141426631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQBD6GBMKRE6JQZOWVNOGLZFAURFAVCNFSM4FPOMIP2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJUGE2DENRWGMYQ. You are receiving this because you were mentioned.
You may want to try using the table "CloudyData_UVB=HM2012_high_density.h5". This was created by another contributor a couple years ago and has rates that go up to much higher densities.
Originally reported by Romeel Davé (Bitbucket: romeeld, GitHub: romeeld)
When I try to run cosmological sims at high resolution or dynamic range, I inevitably see lots of errors like the ones below from Grackle. Unfortunately it doesn't happen in small test runs so it's hard to debug quickly.
This obviously comes from solve_rate_cool(), but I am not sure how to interpret the values, or tell why grackle is having a hard time with these particular particles. It may be that cooling in those particles could be ignored (eg if they are dense enough to be artificially pressurized), but I can't tell which ones they are.
Can anyone offer thoughts on how one might debug this, or why this might be happening? Thanks.
inside if statement solve rate cool: 0 0 FATAL error (2) in MULTI_COOL dt = 1.391E-05 ttmin = 7.688E-06 6.6E-11 7.7E-06 7.9E+09 T inside if statement solve rate cool: 0 0 FATAL error (2) in MULTI_COOL dt = 1.391E-05 ttmin = 7.431E-06 4.3E-10 7.4E-06 2.1E+09 T