ufs-community / ufs-srweather-app

UFS Short-Range Weather Application
Other
56 stars 119 forks source link

[production/AQM.v7] Finalizing the ecflow to support the AQMv7 implementation #942

Closed JianpingHuang-NOAA closed 1 year ago

JianpingHuang-NOAA commented 1 year ago

The purpose of this issue is to collaborate with our EIB ecflow expert, Lin Gan, in reviewing and updating his previous ecflow configuration. This update is essential to include the changes made over the past several months into the ecflow and ensure the AQMv7 package meets the NCO implementation requirements.

The official package in support of the AQMv7 implementation is

https://github.com/ufs-community/ufs-srweather-app/tree/production/AQM.v7

But it requires to merge the PR 940 (https://github.com/ufs-community/ufs-srweather-app/pull/940)

While Chan-Hoo is reviewing the PR, Lin, can you start to review the package that is being tested through near real-time runs

https://github.com/JianpingHuang-NOAA/ufs-srweather-app/tree/feature/aqmv7_IT_test

You can find the local copy at

/lfs/h2/emc/physics/noscrub/jianping.huang/nwdev/packages/aqm.v7.0.84 (Dogwood)

and set up an eclfow test?

@chan-hoo @lgannoaa

lgannoaa commented 1 year ago

@chan-hoo , In review of the AQMv7 implementation package I found this version of the ecflow workflow is different compared to my previous merged PR #812 This version contains missing $HOMEaqm/ecf (ecflow workflow directory). Instead I found the $HOMEaqm/parm/ecflow directory.

(1) $HOMEaqm/parm/ecflow does not look like an acceptable location for production ecflow workflow in NCO code delivery. What is the justification for store AQM production ecflow workflow into a different location? (2) What is the justification for a new directory named include_tmpl in the same code structure level as ecflow definition and scripts directory? (3) What is the justification for a new directory named env in the same code structure level as ecflow definition and scripts directory? (4) Has NCO in agreement to use $HOMEaqm/parm/ecflow, $HOMEaqm/parm/ecflow/include_tmpl, $HOMEaqm/parm/ecflow/env in AQM implementation? Please forward the NCO consent email to me. (5) If NCO has agreed with this new ecflow structure, please provide step-by-step instruction on how to initialize it after getting the package from GitHub. (6) If there is no NCO consent on the $HOMEaqm/parm/ecflow approach, please reverse engineering it and use $HOMEaqm/ecf directory as ecflow workflow location.

@JianpingHuang-NOAA I can not find PR/issue that modified the ecflow workflow solution into the new structure noted above. Please search and let me know why this modification was approved.

Thanks

chan-hoo commented 1 year ago

@JianpingHuang-NOAA, please add [production/AQM.v7] to your title in any issues for AQM.v7.

chan-hoo commented 1 year ago

@lgannoaa, $HOMEaqm/parm/ecflow is the templates for ecFlow. When you generate a workflow, the ecf directory will be created in the home directory like $HOMEaqm/ecf. The directory will contain defs, env, include (not include_tmpl), and scripts.

lgannoaa commented 1 year ago

@chan-hoo , please your input with match item number form each of my question above. For example, (1) $HOMEaqm/parm/ecflow does not look like an acceptable location for production ecflow workflow in NCO code delivery. What is the justification for store AQM production ecflow workflow into a different location? Your answer ... (2) What is the justification for a new directory named include_tmpl in the same code structure level as ecflow definition and scripts directory? Your answer ...

Thanks

chan-hoo commented 1 year ago

@lgannoaa, I think my above comments answer the whole your items.

lgannoaa commented 1 year ago

@JianpingHuang-NOAA Please bring this issue to the management. @chan-hoo refused to provide answer to the question above. Hence, the ecflow development is paused.

chan-hoo commented 1 year ago

@lgannoaa, You didn't understand my comments. I didn't refuse it. Your questions are duplicated.

chan-hoo commented 1 year ago

@lgannoaa, if you need more info, please set up a tag-up. I'll explain it.

chan-hoo commented 1 year ago

@MatthewPyle-NOAA, @aerorahul, I think we should talk about this with @lgannoaa for clarification. What do you think about this?

MatthewPyle-NOAA commented 1 year ago

One suggestion I have would be to move the ecflow generating material from parm/ to a new util/ directory under the package. Numerous production systems utilize a "util" space. The HREF has a util/script/ space to house a template/script system to generate AWIPS parameter files, so is somewhat analogous to what is being done with ecflow for AQM.

chan-hoo commented 1 year ago

@MatthewPyle-NOAA, thanks for the suggestion! @lgannoaa, do you agree with that? or isn't it acceptable yet?

lgannoaa commented 1 year ago

I sent an email to management request my question to be answered as requested. We need the answer each of my question with a match item number from each of my question above. For example, (1) $HOMEaqm/parm/ecflow does not look like an acceptable location for production ecflow workflow in NCO code delivery. What is the justification for store AQM production ecflow workflow into a different location? Your answer ... (2) What is the justification for a new directory named include_tmpl in the same code structure level as ecflow definition and scripts directory? Your answer ...

chan-hoo commented 1 year ago

@lgannoaa, okay. Let's wait for the management response.

JianpingHuang-NOAA commented 1 year ago

@chan-hoo @MatthewPyle-NOAA , Can you help to answer the following questions from Lin ?

The normal location for ecflow workflow in NCO is under HOME/ecf directory. The main concern with Chan-Hoo's approach is: (1) There is no agreement that NCO will accept this kind approach. (2) This approach limited the ability for NCO to modify ecflow workflow job manually. (3) There is no document on how AQM will run ecflow use this new approach (i.e. create_ecflow_scripts.py).

The location of the ecflow workflow should be fixed in $HOMEaqm/ecf unless agreed by NCO. Here is why I asked the question above: (1) $HOMEaqm/parm/ecflow does not look like an acceptable location for production ecflow workflow in NCO code delivery. What is the justification for store AQM production ecflow workflow into a different location? (We need to ensure NCO is in agreement for this new approach. Does create_ecflow_scripts.py need to be run every cycle?) (2) What is the justification for a new directory named include_tmpl in the same code structure level as ecflow definition and scripts directory? (include_tmpl is not a standard name for ecflow workflow) (3) What is the justification for a new directory named env in the same code structure level as ecflow definition and scripts directory? (ecflow workflow used by NCO does not have a env directory) (4) Has NCO in agreement to use $HOMEaqm/parm/ecflow, $HOMEaqm/parm/ecflow/include_tmpl, $HOMEaqm/parm/ecflow/env in AQM implementation? Please forward the NCO consent email to me. (We need agreement in writing) (5) If NCO has agreed with this new ecflow structure, please provide step-by-step instructions on how to initialize it after getting the package from GitHub. (Chan-Hoo needs to provide step-by-step instruction if this solution is approved. He did not provide as requested) (6) If there is no NCO consent on the $HOMEaqm/parm/ecflow approach, please reverse engineer it and use $HOMEaqm/ecf directory as ecflow workflow location. (Necessary action to move forward as needed)

chan-hoo commented 1 year ago

@JianpingHuang-NOAA, can you explain each item in detail? (1) What does "this kind approach" mean? and What is the agreement by NCO? (2) Please be specific the case that NCO modifies the ecFlow workflow and why "this approach" limits the ability? (3) I provided the quick start guide for the ecFlow option many times (https://drive.google.com/file/d/1zyCvHmuL_LWx0Yto6B1GXF04m-n6Kbs2/view?usp=drive_link ; section 1.3.2). I'd like to ask you "Have you ever read through the document before?".

lgannoaa commented 1 year ago

I asked the step-by-step document from the question 5 above. It was not answered. Instead of answer the question as requested by me and Jianping. I see question in return.

Here is my justification: (1) What does "this kind approach" mean? and What is the agreement by NCO? This kind approach is referenced to using create_ecflow_scripts.py dynamically create ecflow workflow. Locating the ecflwo workflow under $HOMEaqm/parm/ecflow location. This approach need NCO's approval to implement in production. Therefore, I requested @chan-hoo to justify and provide the proof that NCO has the agreement.

(2) Please be specific the case that NCO modifies the ecFlow workflow and why "this approach" limits the ability? It hardwired job names. For example, grp_prep = ["jget_extrn_ics", "jget_extrn_lbcs", "jics", "jlbcs", "jmake_ics", "jmake_lbcs"] grp_product = ["jbias_correction_o3", "jbias_correction_pm25", "jpost_stat_o3", "jpost_stat_pm25", "jpre_post_stat"] NCO will not able to manually modify AQM ecflow jobs with this hardwired approach.

(3) I provided the quick start guide for the ecFlow option many times (https://drive.google.com/file/d/1zyCvHmuL_LWx0Yto6B1GXF04m-n6Kbs2/view?usp=drive_link ; section 1.3.2). I'd like to ask you "Have you ever read through the document before?". This is not a code delivery document. It has developer personal Chan-Hoo's name and intended for used in develop branch. It contain mixed information that is not intended for use as step-by-step instruction for AQM NCO code delivery. We need an EMC official code delivery instruction on AQM production package.

chan-hoo commented 1 year ago

@lgannoaa, (1) As I mentioned above, the directory $HOMEaqm/parm/ecflow contains the templates for ecFlow. When you generate a workflow, the script create_ecflow_scripts.py creates $HOMEaqm/ecf and its sub-directories such as def, include, and scripts from these templates. (2) The script create_ecflow_scripts.py is used only when an ecFlow workflow is initially generated. You don't have to use this script when you modify the ecFlow workflow. In other applications such as global_workflow, I can see the entire ecflow scripts are built-in. How do you modify the workflow in this case? I guess you should add a new scripts for a new task. This should be the same in AQM.v7. (3) I have been never requested to provide the official code delivery instruction before. @JianpingHuang-NOAA is the project lead for AQM. I think he should provide this document (At least he should have asked me about this).

JianpingHuang-NOAA commented 1 year ago

@lgannoaa Can you follow Section 1.3,2 of (https://drive.google.com/file/d/1zyCvHmuL_LWx0Yto6B1GXF04m-n6Kbs2/view?usp=drive_link ; section 1.3.2) to set up the ecflow for a test? Once this is done, I will include them in the implementation instruction as part of the code delivery package. Regarding the way of using the script create_ecflow_scripts.py to create $HOMEaqm/ecf at the beginning of the package set up. I assume this should be fine. Can Matt and Rahu confirm this is ok for NCO? @MatthewPyle-NOAA @aerorahul

lgannoaa commented 1 year ago

Looks like there is no agreement from NCO about this new approach. @JianpingHuang-NOAA I recommend to communicate with NCO to get agreement in writing. Please let NCO know that with this approach, ecflow workflow jobs will be reset by create_ecflow_scripts.py everytime package updated that require generate_FV3LAM_wflow.py to be run.

lgannoaa commented 1 year ago

@JianpingHuang-NOAA generate_FV3LAM_wflow.py failed following the developer's document Guide_SRW_APP_Jeon.pdf: aqm.v7.0.84/ush> python3 generate_FV3LAM_wflow.py Traceback (most recent call last): File "generate_FV3LAM_wflow.py", line 37, in from set_FV3nml_sfc_climo_filenames import set_FV3nml_sfc_climo_filenames File "/lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/test1/aqm.v7.0.84/ush/set_FV3nml_sfc_climo_filenames.py", line 28, in from set_namelist import set_namelist File "/lfs/h2/emc/ptmp/lin.gan/ecflow_aqm/para/output/prod/today/test1/aqm.v7.0.84/ush/set_namelist.py", line 74, in import f90nml ModuleNotFoundError: No module named 'f90nml'

lgannoaa commented 1 year ago

@JianpingHuang-NOAA In review of the /lfs/h2/emc/physics/noscrub/jianping.huang/nwdev/packages/aqm.v7.0.84 used by AQM official real-time test package on Dogwood, I see two slinked file to fix directory one in $HOMEaqm/fix the other is under $HOMEaqm/ush/fix. These two are identical link to /lfs/h2/emc/physics/noscrub/UFS_SRW_App/aqm.v7/fix. It indicated that package is not ready. Would it be ok to remove the create_ecflow_scripts.py solution, so we can move forward with ecflow workflow development using PR 812 as base? Thanks

JianpingHuang-NOAA commented 1 year ago

@lgannoaa $HOMEaqm/ush/fix was created during the test but I forgot to delete it. It is now deleted. PR812 has been merged into the workflow, right? @chan-hoo

chan-hoo commented 1 year ago

@lgannoaa, the EMC management is on your side :) :) Please let me know what you want for ecFlow. Do you want me to move all the ecFlow directories to HOMEaqm like the global workflow?

lgannoaa commented 1 year ago

@chan-hoo EMC management make a decision based on the best interest for the entire organization. We are here to serve. We work as a team to ensure management's vision is done right.

I am making modification and test from the production/AQM.v7 branch. Please let me know whenever production/AQM.v7 branch is updated going forward. Thanks

chan-hoo commented 1 year ago

@lgannoaa, got it. If you are making change for ecFlow, I wouldn't have any updates from my side. I'll wait for your PR.

JianpingHuang-NOAA commented 1 year ago

@lgannoaa production/AQM.v7 branch has been updated this afternoon. It is now ready for you to create a feature/branch to finalize the ecflow for AQMv7