Closed cbegeman closed 5 years ago
@cbegeman thanks for noting this. Do you have an example run path (case and run directory)? I'd be happy to take a look.
@vanroekel thanks! check out /lustre/scratch3/turquoise/cbegeman/palm/jobs/test_grizzly I've put the sbatch log file there as well. Let me know if that's enough information for you to go on.
looks like I don't have access to that space, all the way back to your user level (/lustre/scratch3/turquoise/cbegeman).
Where should I put the files?
On Oct 24, 2018, at 12:13 PM, Luke Van Roekel notifications@github.com<mailto:notifications@github.com> wrote:
looks like I don't have access to that space, all the way back to your user level (/lustre/scratch3/turquoise/cbegeman).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/xylar/palm_les_lanl/issues/9#issuecomment-432769876, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALe7NrruLcwWbcHNBmy8ulKlEGV6Mnzhks5uoK2ugaJpZM4X4XXn.
You just need to open read permission and execute permission on folders to either the climate group or world. For example
chown -R cbegeman:climate . chmod -R g+rX
or
chmod -R go+rX
Done. Thanks!
On Oct 24, 2018, at 1:01 PM, Xylar Asay-Davis notifications@github.com<mailto:notifications@github.com> wrote:
You just need to open read permission and execute permission on folders to either the climate group or world. For example
chown -R cbegeman:climate
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/xylar/palm_les_lanl/issues/9#issuecomment-432787294, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALe7NrEEoy-0U3RbwJp-CvjODIjq47y5ks5uoLj7gaJpZM4X4XXn.
@cbegeman an update. I can get restarts to work if I don't do horizontal sections (remove shf*_xy from data_output in the namelist). I'm not sure why the slice is not working yet.
Thanks @vanroekel. I'll see whether my new dirichlet bc case restarts with that option removed.
I think I see what is happening now. When you have a variable like shf*_xy in data_output it implies a surface variable, but the model still requires a section to be defined see here https://github.com/xylar/palm_les_lanl/blob/master/trunk/SOURCE/netcdf_interface_mod.f90#L1970-L1978 When you don't define a section the code returns and the header becomes ill defined. I think there are two solutions
Add something like
section_xy = 1,
to your namelist file in the runtime parameters section. for * variables the value doesn't matter as it will output the surface either way. For others, this chooses the vertical position of the slice
we could loop through the requested variables and if they are only surface based, skip the vertical coordinate definitions in the referenced section.
I would pretty strongly suggest sticking with option 1 and closing this issue. But please do let me know what you think @cbegeman and @xylar. Also pinging @qingli411 and @lconlon for their thoughts.
note, I ran a case with your namelist file using option 1 above and restart worked fine.
Thanks, @vanroekel. Option 1 sounds like the most straightforward solution to me.
@vanroekel, I've set section_xy = 1 in that namelist file, leaving shf*_xy, and I get a new error in combine_plot_fields. Have you encountered this? Can you share the namelist file that you used to get a successful run?
NetCDF output enabled
XY-section: 64 file(s) found
forrtl: severe (67): input statement requires too much data, unit 110, file /lustre/scratch3/turquoise/cbegeman/palm/jobs/test_restart_1/RUN_ifort.grizzly_hdf5_mpirun_test_oceanml/PLOT2D_XY_000000
Image PC Routine Line Source
combine_plot_fiel 000000000041D14E forio_return Unknown Unknown
combine_plot_fiel 000000000043F571 for_read_seq_xmit Unknown Unknown
combine_plot_fiel 000000000040A918 Unknown Unknown Unknown
combine_plot_fiel 0000000000408FAE Unknown Unknown Unknown
libc-2.17.so 00002AE5586A73D5 libc_start_main Unknown Unknown
combine_plot_fiel 0000000000408EA9 Unknown Unknown Unknown
I haven't seen that. But this looks like an error in combine_plot_fields, and not the model itself. Either way here is my file. Nothing jumps out at me as different. My only suggestion is perhaps trying fewer processors, your domain is 32x32x32 and you are using 64 processors. I've had some issues with using a lot of processors for a small domain.
&initialization_parameters
nx = 63, ny = 63, nz=64,
dx = 2.5, dy = 2.5, dz = 2.5,
fft_method = 'temperton-algorithm',
ocean = .T.,
idealized_diurnal = .T.,
linear_eqnOfState = .FALSE.
rho_ref = 1000.0
fixed_alpha = .TRUE.
alpha_const = 2.0E-4
beta_const = 8.0E-4
pt_ref = 15.0
sa_ref = 35.0
loop_optimization = 'vector',
initializing_actions = 'read_restart_data'
latitude = 55.6,
momentum_advec = 'pw-scheme',
scalar_advec = 'pw-scheme',
ug_surface =0.0, vg_surface = 0.0,
pt_surface = 276.74,
pt_vertical_gradient = -54.,-0.5,
pt_vertical_gradient_level = -44.,-52.,
sa_surface = 7.65,
sa_vertical_gradient = -70.0,-18.0,
sa_vertical_gradient_level = -44.,-53.,
use_top_fluxes= .T.,
use_surface_fluxes = .F.,
constant_flux_layer= .F.,
top_momentumflux_u = 0.0,
top_momentumflux_v = 0.0,
top_heatflux = 0.,
top_salinityflux = 0.0,
bc_uv_b = 'neumann', bc_uv_t = 'neumann',
bc_pt_b = 'neumann', bc_pt_t = 'neumann',
bc_p_b = 'neumann', bc_p_t = 'neumann',
bc_s_b = 'initial_gradient', bc_s_t = 'neumann',
bc_sa_t = 'neumann', /
&runtime_parameters
end_time = 120000.0,
create_disturbances = .T.,
disturbance_energy_limit = 1.0e-2,
! disturbance_level_b = -4.,
dt_disturb = 150.,
dt_run_control = 0.0,
dt_data_output = 600.0,
dt_dopr = 600.0,
dt_data_output_av = 600.,
section_xy = 1,
netcdf_data_format = 3,
data_output = 'shf*_xy', 'e', 'pt', 'sa', 'u', 'v', 'w', 'rho_ocean', 'alpha_T', 'solar3d',
data_output_pr = 'e','e*', '#pt', '#sa', 'p', 'hyp', 'km', 'kh', 'l',
'#u','#v','w','prho','w"u"','w*u*','w"v"','w*v*','w"pt"','w*pt*',
@cbegeman, typically we would leave the issue open until the PR to fix it has been merged. It's also customary to say in the issue that it was fixed by a PR, in this case #11.
@xylar got it, thanks.
No problem! I know how satisfying it can be to close an issue as fixed so I'm sorry to take that away from you ;-)
addressed by #11
I noticed that somewhere along the line we broke the restart capability. The error is generated at https://github.com/xylar/palm_les_lanl/blob/74b332fd5bd95b45efbca99b17b35ee1b8230805/trunk/SOURCE/netcdf_interface_mod.f90#L2462
I went back to @vanroekel 's old "palm_les_updates" version and verified that restart worked there. It did run successfully, but with a different error message: errors in local file ENVPAR some variables for steering may not be properly set
Do any of you know of a version of the code where you had a successful restart? Or do you have ideas about what the source of the issue is? Thanks!