Closed NoraLoose closed 1 year ago
Okay, maybe 1. (the PBS submission script) is not really necessary. If your HPC cluster does not support PBS, it would be hard for you test it out. The user can hopefully figure out how to translate from slurm to PBS if necessary.
The smaller tutorial on the sphere has been added that runs in just a couple minutes on one processor, with a bash script to run all of the steps directly (without submitting to a scheduler).
The Scalar and small helmholtz, and large helmholtz have had jupyter notebooks added to them to show using the output files.
I also integrated in some of the newer user-friendliness tools that make reading the outputs easier when using post-processing.
@NoraLoose just checking in to see if these changes / updates are sufficient. Thanks!
Thanks for making these changes. I will go through the updated tutorials today.
Two quick thoughts:
Would it be possible to change the tutorial structure in the documentation to reflect different sections for the different tutorials? Right now, it looks like they are all under the section "Basic Tutorial":
I wonder if it is possible to integrate the notebooks (statically) into the documentation, so users can see the figures when cruising through the documentation. No worries if this is too hard to do! But could potentially be a nice addition.
When I try to compile coarse_grain_helmholtz.x
in the small Helmholtz tutorial, I get the following error:
Functions/Helmholtz/filtering_helmholtz.cpp(1191): error: namespace "constants" has no member "COMP_WIND_FORCE"
if ( constants::COMP_WIND_FORCE ){
^
Oh yikes, sorry, I missed a file in one of the git commits. I'll fix that right away!
COMP_WIND_FORCE should be included in the constants file on the main branch (as of commit d951de57f9dcd7ebcbf69977223367858a12076c on Sept 21), so that shouldn't be coming up as an issue, but if you're still getting it, please let me know!
Ok, great. Compilation works fine now. But when running ./run_all_steps.sh
, I get stuck after
Start-up time = 0.02928596735001
Filtering time = 270.4323036373
(clock resolution = 1e-09)
Is this normal?
The Filtering time = ...
is printed when the coarse-grain routine finishes. This should then be followed with a bash for-loop that merges the files together, which should print something like Attempting to merge all files matching pattern outputs/postprocess_full_*nc into output file RESULTS_full.nc.
If after running you have files RESULTS_full.nc
, RESULTS_toroidal.nc
, and RESULTS_potential.nc
, then the process ran successfully. [I've added an echo statement to the end of the "run all steps" routines to announce successful completion.]
When I run ./run_all_steps.sh
, the full output that I get is:
++ python generate_data_sphere.py
++ python define_geographic_regions.py
++ ./Helmholtz_projection.x --input_file ./velocity_sample.nc --seed_file zero --output_file projection_ui.nc --max_iterations 1500 --tolerance 1e-18 --Tikhov_Laplace 1
Commandline flag "--input_file" got value "./velocity_sample.nc"
Commandline flag "--output_file" got value "projection_ui.nc"
Commandline flag "--seed_file" got value "zero"
Commandline flag "--time" received no value - will use default "time"
Commandline flag "--depth" received no value - will use default "depth"
Commandline flag "--latitude" received no value - will use default "latitude"
Commandline flag "--longitude" received no value - will use default "longitude"
Commandline flag "--is_degrees" received no value - will use default "true"
Commandline flag "--Nprocs_in_time" received no value - will use default "1"
Commandline flag "--Nprocs_in_depth" received no value - will use default "1"
Commandline flag "--zonal_vel" received no value - will use default "uo"
Commandline flag "--merid_vel" received no value - will use default "vo"
Commandline flag "--tor_seed" received no value - will use default "Psi_seed"
Commandline flag "--pot_seed" received no value - will use default "Phi_seed"
Commandline flag "--tolerance" got value "1e-18"
Commandline flag "--max_iterations" got value "1500"
Commandline flag "--Tikhov_Laplace" got value "1"
Commandline flag "--use_mask" received no value - will use default "false"
Commandline flag "--use_area_weight" received no value - will use default "true"
Compiled at 17:04:51 on Dec 8 2022.
Version 3.1.1
Using spherical coordinates.
Nproc(time, depth) = (1, 1)
deriv_scale_factor = 8.59437
Termination counts: 0 from absolute tolerance
0 from relative tolerance
1 from iteration maximum
0 from rounding errors
0 from other causes
Process completed.
++ FILTER_SCALES='
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
'
++ mkdir -p outputs
++ cd outputs
++ ../coarse_grain_helmholtz.x --Helmholtz_input_file ../projection_ui.nc --velocity_input_file ../velocity_sample.nc --tor_field Psi --pot_field Phi --vel_field uo --region_definitions_file ../region_definitions.nc --filter_scales '
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
'
Commandline flag "--Helmholtz_input_file" got value "../projection_ui.nc"
Commandline flag "--velocity_input_file" got value "../velocity_sample.nc"
Commandline flag "--radial_velocity_input_file" received no value - will use default "NONE"
Commandline flag "--time" received no value - will use default "time"
Commandline flag "--depth" received no value - will use default "depth"
Commandline flag "--latitude" received no value - will use default "latitude"
Commandline flag "--longitude" received no value - will use default "longitude"
Commandline flag "--is_degrees" received no value - will use default "true"
Commandline flag "--use_depth_derivs" received no value - will use default "false"
Commandline flag "--depth_is_elevation" received no value - will use default "false"
Commandline flag "--Nprocs_in_time" received no value - will use default "1"
Commandline flag "--Nprocs_in_depth" received no value - will use default "1"
Commandline flag "--tor_field" got value "Psi"
Commandline flag "--pot_field" got value "Phi"
Commandline flag "--vel_field" got value "uo"
Commandline flag "--u_r_field" received no value - will use default "u_r"
Commandline flag "--region_definitions_file" got value "../region_definitions.nc"
Commandline flag "--region_definitions_dim" received no value - will use default "region"
Commandline flag "--region_definitions_var" received no value - will use default "region_definition"
Commandline flag "--filter_scales" got value "
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
"
Filter scales (27) are: 100km, 129km, 167km, 215km, 278km, 359km, 464km, 599km, 774km, 1000km, 1290km, 1670km, 2150km, 2780km, 3590km, 4640km, 5990km, 7740km, 10000km, 12900km, 16700km, 21500km, 27800km, 35900km, 46400km, 59900km, 77400km,
Compiled at 17:04:51 on Dec 8 2022.
Version 3.1.1
Using spherical coordinates.
Nproc(time, depth) = (1, 1)
Scale 1 of 27 (100 km)
....|....|....|....|
Scale 2 of 27 (129 km)
....|....|....|....|
Scale 3 of 27 (167 km)
....|....|....|....|
Scale 4 of 27 (215 km)
....|....|....|....|
Scale 5 of 27 (278 km)
....|....|....|....|
Scale 6 of 27 (359 km)
....|....|....|....|
Scale 7 of 27 (464 km)
....|....|....|....|
Scale 8 of 27 (599 km)
....|....|....|....|
Scale 9 of 27 (774 km)
....|....|....|....|
Scale 10 of 27 (1000 km)
....|....|....|....|
Scale 11 of 27 (1290 km)
....|....|....|....|
Scale 12 of 27 (1670 km)
....|....|....|....|
Scale 13 of 27 (2150 km)
....|....|....|....|
Scale 14 of 27 (2780 km)
....|....|....|....|
Scale 15 of 27 (3590 km)
....|....|....|....|
Scale 16 of 27 (4640 km)
....|....|....|....|
Scale 17 of 27 (5990 km)
....|....|....|....|
Scale 18 of 27 (7740 km)
....|....|....|....|
Scale 19 of 27 (10000 km)
....|....|....|....|
Scale 20 of 27 (12900 km)
....|....|....|....|
Scale 21 of 27 (16700 km)
....|....|....|....|
Scale 22 of 27 (21500 km)
....|....|....|....|
Scale 23 of 27 (27800 km)
....|....|....|....|
Scale 24 of 27 (35900 km)
....|....|....|....|
Scale 25 of 27 (46400 km)
....|....|....|....|
Scale 26 of 27 (59900 km)
....|....|....|....|
Scale 27 of 27 (77400 km)
....|....|....|....|
Process completed.
Start-up time = 0.08102444186807
Filtering time = 272.5161228143
(clock resolution = 3.84612007634e-10)
++ cd ..
++ KINDS=("full" "toroidal" "potential")
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_full_*nc' --output_filename RESULTS_full.nc
Attempting to merge all files matching pattern outputs/postprocess_full_*nc into output file RESULTS_full.nc.
Identified 27 files for merging.
Identified 5 dimensions and 29 variables to copy into merged file.
.. copying dimension time
.. copying dimension depth
.. copying dimension latitude
.. copying dimension longitude
.. copying dimension region
Preparing to copy variables...
.... | .... | .... | ....
Done.
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_toroidal_*nc' --output_filename RESULTS_toroidal.nc
Attempting to merge all files matching pattern outputs/postprocess_toroidal_*nc into output file RESULTS_toroidal.nc.
Identified 27 files for merging.
Identified 5 dimensions and 29 variables to copy into merged file.
.. copying dimension time
.. copying dimension depth
.. copying dimension latitude
.. copying dimension longitude
.. copying dimension region
Preparing to copy variables...
.... | .... | .... | ....
Done.
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_potential_*nc' --output_filename RESULTS_potential.nc
Attempting to merge all files matching pattern outputs/postprocess_potential_*nc into output file RESULTS_potential.nc.
Identified 27 files for merging.
Identified 5 dimensions and 29 variables to copy into merged file.
.. copying dimension time
.. copying dimension depth
.. copying dimension latitude
.. copying dimension longitude
.. copying dimension region
Preparing to copy variables...
.... | .... | .... | ....
Done.
Hi Ben,
When I run the small tutorial on the sphere, I always get stuck at: Start-up time = 0.03308053314686 Filtering time = 270.9834141545 (clock resolution = 1e-09)
I (of course) can't find the files RESULTS_FULL.nc etc. Do you have any insights on this? It seems Nora had the same issue, I am not sure if she got it resolved. Thank you very much!
Thanks, Yiming
That's very peculiar! It sounds like the bash for
loop might not be running.
Could you
1) confirm if you have an outputs
directory that was created with a bunch of files that look like postprocess_..._##km.nc
2) try manually running one of the python merging routines to make the RESULTS_full.nc
file?
The command for 2 should be
python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocess_full_*nc" --output_filename "RESULTS_full.nc"
Traceback (most recent call last):
File "../../../PythonTools/OutputHandling/merge_postprocess_results.py", line 144, in
Okay, it sounds like the coarse-graining part itself runs, which is good, it's just the output-handling that fails.
Could you tell me what version of netcdf and netCDF4 (the python library) you have? I would have hoped it wasn't that sensitive to those, but maybe?
Hi Ben. For your reference, here is the version I have. netcdf (4.8.1) and netCDF4(1.5.3).
Thanks! I'm at AGU this week but will try digging into this tonight.
Okay, it looks like somewhere in the change to v1.5 they required fill_value to be set at the time of variable creation.
I've updated the merging routines and tested them on v1.5.8 (what I had installed on my laptop), and they run as expected. When you get a chance, can you try pulling the latest commits and running the merge step again?
Hi Ben,
Thanks for updating the package. Unfortunately, when running ./run_all_steps.sh it is still stuck at the same place. But when manually run the python merging routine, I can get the merged data file this time.
Yiming
@yguo20ncsu okay, great! I'll probably just replace the bash for loop with the three explicit python merge commands.
If you're able to get the merged data files though then it sounds like everything is running :-) !
@yguo20ncsu to test if it is just the loop not working, could you try running a bash script with the following contents:
KINDS=("full" "toroidal" "potential")
for KIND in "${KINDS[@]}"
do
echo ${KIND}
done
Have you run into any other problems running the code / tutorials?
Thanks!
@NoraLoose I've removed the bash for loops and just replaced them with manual repetitions of the command, since the loops don't seem to be as portable as I'd hoped!
@bastorer Hi Ben, actually the loop works good in this test bash script. I don't have any other problems running the code so far (I also have replaced the loop with all commands manually). Thank you so much for your help! Thanks!
@bastorer I tried running ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh
again, but still have the same problem. Does this file still need to be updated?
The bash loop has been replaced by just directly calling the three python commands. Are you getting any specific errors about the python crashing?
No crashing, I simply get stuck after:
Scale 27 of 27 (77400 km)
....|....|....|....|
Process completed.
Start-up time = 0.02932177111506
Filtering time = 272.1048071757
(clock resolution = 1e-09)
@yguo20ncsu, does this still happen to you too, when running ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh
?
Yes, I also still get stuck at the same place when running run_allsteps.sh. Ben mentioned that it is probably something wrong with the bash loop where all nc files in "outputs" directory are merged into RESULTS*.nc. Below is how to manually solve it as Ben suggested.
I commented the loop in the run_all_steps.sh as:
Instead of using bash loop, manually run the three python merging routines as: python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocessfullnc" --output_filename "RESULTS_full.nc" python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocesstoroidalnc" --output_filename "RESULTS_toroidal.nc" python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocesspotential*nc" --output_filename "RESULTS_potential.nc"
then you will see the merged nc files (RESULTS_*.nc)
The new version of run_all_steps.sh
should have those three python commands written out without the bash loop, which I thought would have solved the problem.
It's odd that it doesn't print the cd ..
that happens after running coarse-graining but before python (the set -ex
at the top should make it print out what's being executed). For some reason it just doesn't execute anything after coarse-graining.
After a bit of reading, I suspect it might be set -e
that's the problem. Ideally that would halt the script whenever an error occurs. However, it sounds like it can be a little over zealous in what it classifies as an 'error', and those classifications can change between version of bash.
If you agree (and if replacing set -ex
with set -x
fixes the problem for you), I'll replace the set -ex
with set -x
in the bash scripts in the tutorial.
Hmm, I replaced set -ex
with set -x
in line 5 of ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh
, but it did not fix the problem. :/
Hmm, darn, I was hoping that would do it haha. I have to admit, I'm really not sure what could be causing the issue at this point.
Hi Nora,
As far as I can tell, this is an issue with bash scripting that I'm honestly not sure how to solve. All of the parts themselves run correctly, the final python scripts just need to be called separately.
If I add a note in the tutorial documentation highlighting this (essentially saying that if the RESULTS_*.nc
files are missing, run <insert python code here>
), would that be sufficient?
It's not ideal, but I'm not sure what else to do for the bash script.
Thanks!
Sure, I'm okay with any solution that works as long as it's documented. :)
Documentation updated :-)
Okay great, I'm closing this issue then.
https://github.com/openjournals/joss-reviews/issues/4277