husseinaluie / FlowSieve

FlowSieve coarse-graining code base
https://flowsieve.readthedocs.io/en/latest/
Other
18 stars 9 forks source link

Running the tutorial on the sphere #17

Closed NoraLoose closed 1 year ago

NoraLoose commented 1 year ago

https://github.com/openjournals/joss-reviews/issues/4277

bastorer commented 1 year ago
  1. I'd be happy to include PBS submit scripts, but I'm afraid I'm not very familiar with it. Could you point me to a documentation page or sample script?
  2. That's a very fair point, three hours is a bit long to wait just to find out if something works. I propose keeping that one as it is, with the different scales of eddies, and add in a second spherical tutorial that runs on very low resolution (e.g. 2-degree grid) in order to give results very quickly.
  3. Agreed, notebooks would be nice. I originally just used .py scripts so that everything could be wrapped up in the submission script, but having a notebook that shows users how to interact with the outputs would be very beneficial.
NoraLoose commented 1 year ago

Okay, maybe 1. (the PBS submission script) is not really necessary. If your HPC cluster does not support PBS, it would be hard for you test it out. The user can hopefully figure out how to translate from slurm to PBS if necessary.

  1. and 3. sound good!
bastorer commented 1 year ago

The smaller tutorial on the sphere has been added that runs in just a couple minutes on one processor, with a bash script to run all of the steps directly (without submitting to a scheduler).

The Scalar and small helmholtz, and large helmholtz have had jupyter notebooks added to them to show using the output files.

I also integrated in some of the newer user-friendliness tools that make reading the outputs easier when using post-processing.

bastorer commented 1 year ago

@NoraLoose just checking in to see if these changes / updates are sufficient. Thanks!

NoraLoose commented 1 year ago

Thanks for making these changes. I will go through the updated tutorials today.

Two quick thoughts:

  1. Would it be possible to change the tutorial structure in the documentation to reflect different sections for the different tutorials? Right now, it looks like they are all under the section "Basic Tutorial": Screen Shot 2022-11-11 at 8 21 10 AM

  2. I wonder if it is possible to integrate the notebooks (statically) into the documentation, so users can see the figures when cruising through the documentation. No worries if this is too hard to do! But could potentially be a nice addition.

bastorer commented 1 year ago
  1. Oh, hmm, I'll try to fix that. Doxygen table of contents are always more finicky than I expect haha, but that would be a good thing to have fixed! I'll investigate :-)
  2. Integrating the notebooks statically probably won't work, but saving the figures as pngs statically and embedding them in the documentation shouldn't be an issue. I'll try to see how to do that.
NoraLoose commented 1 year ago

When I try to compile coarse_grain_helmholtz.x in the small Helmholtz tutorial, I get the following error:

Functions/Helmholtz/filtering_helmholtz.cpp(1191): error: namespace "constants" has no member "COMP_WIND_FORCE"
          if ( constants::COMP_WIND_FORCE ){ 
                          ^
bastorer commented 1 year ago

Oh yikes, sorry, I missed a file in one of the git commits. I'll fix that right away!

bastorer commented 1 year ago

COMP_WIND_FORCE should be included in the constants file on the main branch (as of commit d951de57f9dcd7ebcbf69977223367858a12076c on Sept 21), so that shouldn't be coming up as an issue, but if you're still getting it, please let me know!

NoraLoose commented 1 year ago

Ok, great. Compilation works fine now. But when running ./run_all_steps.sh, I get stuck after

Start-up time  = 0.02928596735001
Filtering time = 270.4323036373
   (clock resolution = 1e-09)

Is this normal?

bastorer commented 1 year ago

The Filtering time = ... is printed when the coarse-grain routine finishes. This should then be followed with a bash for-loop that merges the files together, which should print something like Attempting to merge all files matching pattern outputs/postprocess_full_*nc into output file RESULTS_full.nc.

If after running you have files RESULTS_full.nc, RESULTS_toroidal.nc, and RESULTS_potential.nc, then the process ran successfully. [I've added an echo statement to the end of the "run all steps" routines to announce successful completion.]

When I run ./run_all_steps.sh, the full output that I get is:

++ python generate_data_sphere.py
++ python define_geographic_regions.py
++ ./Helmholtz_projection.x --input_file ./velocity_sample.nc --seed_file zero --output_file projection_ui.nc --max_iterations 1500 --tolerance 1e-18 --Tikhov_Laplace 1
 Commandline flag "--input_file" got value "./velocity_sample.nc"
 Commandline flag "--output_file" got value "projection_ui.nc"
 Commandline flag "--seed_file" got value "zero"
 Commandline flag "--time" received no value - will use default "time"
 Commandline flag "--depth" received no value - will use default "depth"
 Commandline flag "--latitude" received no value - will use default "latitude"
 Commandline flag "--longitude" received no value - will use default "longitude"
 Commandline flag "--is_degrees" received no value - will use default "true"
 Commandline flag "--Nprocs_in_time" received no value - will use default "1"
 Commandline flag "--Nprocs_in_depth" received no value - will use default "1"
 Commandline flag "--zonal_vel" received no value - will use default "uo"
 Commandline flag "--merid_vel" received no value - will use default "vo"
 Commandline flag "--tor_seed" received no value - will use default "Psi_seed"
 Commandline flag "--pot_seed" received no value - will use default "Phi_seed"
 Commandline flag "--tolerance" got value "1e-18"
 Commandline flag "--max_iterations" got value "1500"
 Commandline flag "--Tikhov_Laplace" got value "1"
 Commandline flag "--use_mask" received no value - will use default "false"
 Commandline flag "--use_area_weight" received no value - will use default "true"

Compiled at 17:04:51 on Dec  8 2022.
  Version 3.1.1

Using spherical coordinates.
 Nproc(time, depth) = (1, 1)

deriv_scale_factor = 8.59437

Termination counts: 0 from absolute tolerance
                    0 from relative tolerance
                    1 from iteration maximum
                    0 from rounding errors
                    0 from other causes

Process completed.

++ FILTER_SCALES='
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
'
++ mkdir -p outputs
++ cd outputs
++ ../coarse_grain_helmholtz.x --Helmholtz_input_file ../projection_ui.nc --velocity_input_file ../velocity_sample.nc --tor_field Psi --pot_field Phi --vel_field uo --region_definitions_file ../region_definitions.nc --filter_scales '
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
'
 Commandline flag "--Helmholtz_input_file" got value "../projection_ui.nc"
 Commandline flag "--velocity_input_file" got value "../velocity_sample.nc"
 Commandline flag "--radial_velocity_input_file" received no value - will use default "NONE"
 Commandline flag "--time" received no value - will use default "time"
 Commandline flag "--depth" received no value - will use default "depth"
 Commandline flag "--latitude" received no value - will use default "latitude"
 Commandline flag "--longitude" received no value - will use default "longitude"
 Commandline flag "--is_degrees" received no value - will use default "true"
 Commandline flag "--use_depth_derivs" received no value - will use default "false"
 Commandline flag "--depth_is_elevation" received no value - will use default "false"
 Commandline flag "--Nprocs_in_time" received no value - will use default "1"
 Commandline flag "--Nprocs_in_depth" received no value - will use default "1"
 Commandline flag "--tor_field" got value "Psi"
 Commandline flag "--pot_field" got value "Phi"
 Commandline flag "--vel_field" got value "uo"
 Commandline flag "--u_r_field" received no value - will use default "u_r"
 Commandline flag "--region_definitions_file" got value "../region_definitions.nc"
 Commandline flag "--region_definitions_dim" received no value - will use default "region"
 Commandline flag "--region_definitions_var" received no value - will use default "region_definition"
 Commandline flag "--filter_scales" got value "
1.e5 1.29e5 1.67e5 2.15e5 2.78e5 3.59e5 4.64e5 5.99e5 7.74e5
1.e6 1.29e6 1.67e6 2.15e6 2.78e6 3.59e6 4.64e6 5.99e6 7.74e6
1.e7 1.29e7 1.67e7 2.15e7 2.78e7 3.59e7 4.64e7 5.99e7 7.74e7
"
Filter scales (27) are:  100km,  129km,  167km,  215km,  278km,  359km,  464km,  599km,  774km,  1000km,  1290km,  1670km,  2150km,  2780km,  3590km,  4640km,  5990km,  7740km,  10000km,  12900km,  16700km,  21500km,  27800km,  35900km,  46400km,  59900km,  77400km,

Compiled at 17:04:51 on Dec  8 2022.
  Version 3.1.1

Using spherical coordinates.
 Nproc(time, depth) = (1, 1)

Scale 1 of 27 (100 km)
....|....|....|....|

Scale 2 of 27 (129 km)
....|....|....|....|

Scale 3 of 27 (167 km)
....|....|....|....|

Scale 4 of 27 (215 km)
....|....|....|....|

Scale 5 of 27 (278 km)
....|....|....|....|

Scale 6 of 27 (359 km)
....|....|....|....|

Scale 7 of 27 (464 km)
....|....|....|....|

Scale 8 of 27 (599 km)
....|....|....|....|

Scale 9 of 27 (774 km)
....|....|....|....|

Scale 10 of 27 (1000 km)
....|....|....|....|

Scale 11 of 27 (1290 km)
....|....|....|....|

Scale 12 of 27 (1670 km)
....|....|....|....|

Scale 13 of 27 (2150 km)
....|....|....|....|

Scale 14 of 27 (2780 km)
....|....|....|....|

Scale 15 of 27 (3590 km)
....|....|....|....|

Scale 16 of 27 (4640 km)
....|....|....|....|

Scale 17 of 27 (5990 km)
....|....|....|....|

Scale 18 of 27 (7740 km)
....|....|....|....|

Scale 19 of 27 (10000 km)
....|....|....|....|

Scale 20 of 27 (12900 km)
....|....|....|....|

Scale 21 of 27 (16700 km)
....|....|....|....|

Scale 22 of 27 (21500 km)
....|....|....|....|

Scale 23 of 27 (27800 km)
....|....|....|....|

Scale 24 of 27 (35900 km)
....|....|....|....|

Scale 25 of 27 (46400 km)
....|....|....|....|

Scale 26 of 27 (59900 km)
....|....|....|....|

Scale 27 of 27 (77400 km)
....|....|....|....|

Process completed.

Start-up time  = 0.08102444186807
Filtering time = 272.5161228143
   (clock resolution = 3.84612007634e-10)
++ cd ..
++ KINDS=("full" "toroidal" "potential")
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_full_*nc' --output_filename RESULTS_full.nc
Attempting to merge all files matching pattern outputs/postprocess_full_*nc into output file RESULTS_full.nc.
  Identified 27 files for merging.
  Identified 5 dimensions and 29 variables to copy into merged file.
  .. copying dimension time
  .. copying dimension depth
  .. copying dimension latitude
  .. copying dimension longitude
  .. copying dimension region
  Preparing to copy variables...
  .... | .... | .... | ....
Done.
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_toroidal_*nc' --output_filename RESULTS_toroidal.nc
Attempting to merge all files matching pattern outputs/postprocess_toroidal_*nc into output file RESULTS_toroidal.nc.
  Identified 27 files for merging.
  Identified 5 dimensions and 29 variables to copy into merged file.
  .. copying dimension time
  .. copying dimension depth
  .. copying dimension latitude
  .. copying dimension longitude
  .. copying dimension region
  Preparing to copy variables...
  .... | .... | .... | ....
Done.
++ for KIND in '"${KINDS[@]}"'
++ python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern 'outputs/postprocess_potential_*nc' --output_filename RESULTS_potential.nc
Attempting to merge all files matching pattern outputs/postprocess_potential_*nc into output file RESULTS_potential.nc.
  Identified 27 files for merging.
  Identified 5 dimensions and 29 variables to copy into merged file.
  .. copying dimension time
  .. copying dimension depth
  .. copying dimension latitude
  .. copying dimension longitude
  .. copying dimension region
  Preparing to copy variables...
  .... | .... | .... | ....
Done.
yguo20ncsu commented 1 year ago

Hi Ben,

When I run the small tutorial on the sphere, I always get stuck at: Start-up time = 0.03308053314686 Filtering time = 270.9834141545 (clock resolution = 1e-09)

I (of course) can't find the files RESULTS_FULL.nc etc. Do you have any insights on this? It seems Nora had the same issue, I am not sure if she got it resolved. Thank you very much!

Thanks, Yiming

bastorer commented 1 year ago

That's very peculiar! It sounds like the bash for loop might not be running.

Could you 1) confirm if you have an outputs directory that was created with a bunch of files that look like postprocess_..._##km.nc 2) try manually running one of the python merging routines to make the RESULTS_full.nc file?

The command for 2 should be

python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocess_full_*nc" --output_filename "RESULTS_full.nc"
yguo20ncsu commented 1 year ago
  1. the outputs directory with postprocess...##km.nc exists.
  2. maybe the error comes from missing this keyword when create variables.

Traceback (most recent call last): File "../../../PythonTools/OutputHandling/merge_postprocess_results.py", line 144, in nc_var_objs[varname].setncattr(attr, all_vars[varname].getncattr( attr ) ) File "netCDF4/_netCDF4.pyx", line 4123, in netCDF4._netCDF4.Variable.setncattr AttributeError: _FillValue attribute must be set when variable is created (using fill_value keyword to createVariable)

bastorer commented 1 year ago

Okay, it sounds like the coarse-graining part itself runs, which is good, it's just the output-handling that fails.

Could you tell me what version of netcdf and netCDF4 (the python library) you have? I would have hoped it wasn't that sensitive to those, but maybe?

yguo20ncsu commented 1 year ago

Hi Ben. For your reference, here is the version I have. netcdf (4.8.1) and netCDF4(1.5.3).

bastorer commented 1 year ago

Thanks! I'm at AGU this week but will try digging into this tonight.

bastorer commented 1 year ago

Okay, it looks like somewhere in the change to v1.5 they required fill_value to be set at the time of variable creation.

I've updated the merging routines and tested them on v1.5.8 (what I had installed on my laptop), and they run as expected. When you get a chance, can you try pulling the latest commits and running the merge step again?

yguo20ncsu commented 1 year ago

Hi Ben,

Thanks for updating the package. Unfortunately, when running ./run_all_steps.sh it is still stuck at the same place. But when manually run the python merging routine, I can get the merged data file this time.

Yiming

bastorer commented 1 year ago

@yguo20ncsu okay, great! I'll probably just replace the bash for loop with the three explicit python merge commands.

If you're able to get the merged data files though then it sounds like everything is running :-) !

bastorer commented 1 year ago

@yguo20ncsu to test if it is just the loop not working, could you try running a bash script with the following contents:

KINDS=("full" "toroidal" "potential")

for KIND in "${KINDS[@]}"
do
    echo ${KIND}
done

Have you run into any other problems running the code / tutorials?

Thanks!

bastorer commented 1 year ago

@NoraLoose I've removed the bash for loops and just replaced them with manual repetitions of the command, since the loops don't seem to be as portable as I'd hoped!

yguo20ncsu commented 1 year ago

@bastorer Hi Ben, actually the loop works good in this test bash script. I don't have any other problems running the code so far (I also have replaced the loop with all commands manually). Thank you so much for your help! Thanks!

NoraLoose commented 1 year ago

@bastorer I tried running ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh again, but still have the same problem. Does this file still need to be updated?

bastorer commented 1 year ago

The bash loop has been replaced by just directly calling the three python commands. Are you getting any specific errors about the python crashing?

NoraLoose commented 1 year ago

No crashing, I simply get stuck after:

Scale 27 of 27 (77400 km)
....|....|....|....|

Process completed.

Start-up time  = 0.02932177111506
Filtering time = 272.1048071757
   (clock resolution = 1e-09)
NoraLoose commented 1 year ago

@yguo20ncsu, does this still happen to you too, when running ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh?

yguo20ncsu commented 1 year ago

Yes, I also still get stuck at the same place when running run_allsteps.sh. Ben mentioned that it is probably something wrong with the bash loop where all nc files in "outputs" directory are merged into RESULTS*.nc. Below is how to manually solve it as Ben suggested.

  1. I commented the loop in the run_all_steps.sh as:

    KINDS=("full" "toroidal" "potential")

    for KIND in "${KINDS[@]}"

    do

    done

  2. Instead of using bash loop, manually run the three python merging routines as: python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocessfullnc" --output_filename "RESULTS_full.nc" python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocesstoroidalnc" --output_filename "RESULTS_toroidal.nc" python ../../../PythonTools/OutputHandling/merge_postprocess_results.py --file_pattern "outputs/postprocesspotential*nc" --output_filename "RESULTS_potential.nc"

  3. then you will see the merged nc files (RESULTS_*.nc)

bastorer commented 1 year ago

The new version of run_all_steps.sh should have those three python commands written out without the bash loop, which I thought would have solved the problem.

It's odd that it doesn't print the cd .. that happens after running coarse-graining but before python (the set -ex at the top should make it print out what's being executed). For some reason it just doesn't execute anything after coarse-graining.

bastorer commented 1 year ago

After a bit of reading, I suspect it might be set -e that's the problem. Ideally that would halt the script whenever an error occurs. However, it sounds like it can be a little over zealous in what it classifies as an 'error', and those classifications can change between version of bash.

If you agree (and if replacing set -ex with set -x fixes the problem for you), I'll replace the set -ex with set -x in the bash scripts in the tutorial.

NoraLoose commented 1 year ago

Hmm, I replaced set -ex with set -x in line 5 of ./Tutorial/Spherical_Demo/Low_Resolution/run_all_steps.sh, but it did not fix the problem. :/

bastorer commented 1 year ago

Hmm, darn, I was hoping that would do it haha. I have to admit, I'm really not sure what could be causing the issue at this point.

bastorer commented 1 year ago

Hi Nora,

As far as I can tell, this is an issue with bash scripting that I'm honestly not sure how to solve. All of the parts themselves run correctly, the final python scripts just need to be called separately.

If I add a note in the tutorial documentation highlighting this (essentially saying that if the RESULTS_*.nc files are missing, run <insert python code here>), would that be sufficient?

It's not ideal, but I'm not sure what else to do for the bash script.

Thanks!

NoraLoose commented 1 year ago

Sure, I'm okay with any solution that works as long as it's documented. :)

bastorer commented 1 year ago

Documentation updated :-)

NoraLoose commented 1 year ago

Okay great, I'm closing this issue then.