Open wenegrat opened 6 months ago
Ok, probably not the sin/cos functions since Victoria and I both use those also.
Take a look at this link and note that you want to use const whenever possible (doesn't appear you are doing this currently). https://github.com/CliMA/Oceananigans.jl/blob/main/docs/src/simulation_tips.md#global-variables-that-need-to-be-used-in-gpu-computations-need-to-be-defined-as-constants-or-passed-as-parameters
I would look at the output and make sure that after the first handful of timesteps the model is using the max Δt. My expectation is that it should be (since I think everything should be laminar in this setup).
Just a quick question here: where does the 10sec max_Δt
come from? I think the max timestep should be connected to some physical time-scale of the flow, no? Like 1/N
(N
being the stratification). In which case it shouldn't be hardcoded.
It probably comes from a suggestion of mine that my simulations (in the laminar phase) take timesteps of O(10 seconds). I believe @loganpknudsen has tracked this down to the use of very large interior velocities and the fact that the CFL condition now includes those in the calculation. Actually perhaps they shouldn't do so in the case of a background flow in a 'flat' dimension, but not a big deal.
Actually perhaps they shouldn't do so in the case of a background flow in a 'flat' dimension, but not a big deal.
Yeah I agree they shouldn't. I remember seeing some discussion about the CFL calculation a few months ago which is probably where they changed it. It might be worth raising an issue at some point since it's an easy fix.
Also, just reinforcing what I had already mentioned to @loganpknudsen before: I think the strategy of progressively turning off things from the code to see where the slowdown is coming from is probably the way to go if you haven't yet managed to speed up the code.
Also², @loganpknudsen last we talked you were gonna time how long each time-step was taking on average (which you can do either using Oceanostics or just letting the whole thing run and dividing by the number of timesteps). Did you manage to do it?
For reference, my very complex headland simulations running on 100 million grid points take about 1.7 seconds per time step on an A100 GPU and about 2.5 seconds on V100s.
Related to that: like @wenegrat said, your velocity was pretty large, likely contributing to a small Δt, in which case the slowdown is "physical". What's the average Δt in your simulations? (Or I guess a better question: how does it evolve?)
I will get some answers on this once I get things running again, sorry just saw this. I can try switching to the A100 gpu
https://github.com/loganpknudsen/BottomBoundaryLayer/blob/bc0f5945d47c88e71a5ee7d95afd7557c79e332e/BBL_with_oscillations_code_GPU_check.jl#L57
This is a large velocity. You could very easily reduce this by a factor of 5 (ie. make it 10 cm/s) and have a realistic DWBC type flow.
https://github.com/loganpknudsen/BottomBoundaryLayer/blob/bc0f5945d47c88e71a5ee7d95afd7557c79e332e/BBL_with_oscillations_code_GPU_check.jl#L59
Likewise this is a very strong stratification (I assume you chose this because your velocity was so large). Even an order of magnitude smaller would be representative of a strong pycnocline, whereas a deep ocean value would be 3 orders of magnitude smaller (10^{-7}).
https://github.com/loganpknudsen/BottomBoundaryLayer/blob/bc0f5945d47c88e71a5ee7d95afd7557c79e332e/BBL_with_oscillations_code_GPU_check.jl#L121
I would look at the output and make sure that after the first handful of timesteps the model is using the max $\Delta t$. My expectation is that it should be (since I think everything should be laminar in this setup).
https://github.com/loganpknudsen/BottomBoundaryLayer/blob/bc0f5945d47c88e71a5ee7d95afd7557c79e332e/BBL_with_oscillations_code_GPU_check.jl#L73
This line (along with the 2 that follow, involving sin and cos) might be culprits for the slow down, as I believe I recall Tomas mentioning the GPU doesn't like trig functions.
The way I would probably debug this is (making sure at each step it runs fast)
Let me know what you find!