ufs-community / UFS_UTILS

Utilities for the NCEP models.
Other
21 stars 104 forks source link

The new function of chgres_cube to output the results in netcdf 4 format #689

Closed TingLei-NOAA closed 1 year ago

TingLei-NOAA commented 2 years ago

Currently, the chgres_cube is generating results in netcdf 4 classic with hardwired setup. This issue is opened for adding the function to generate netcdf 4 files. There are two reasons for this request: 1. The chgres_cube generated FV3-LAM cold start files would be read in and updated by GSI. Currently the GSI IO interface for fv3-lam (including the cold start files generated by chgres_cube ) is using the parallel IO of netcdf 4. When the netcdf 4 classic files are used for GSI, some minor differences would generated in the final analysis fields. Hence, it is expected chgres_cube can create netcdf 4 files. It would be very helpful if this new function with chgres_cube could take care of the chunk size /shape in the generated netcdf 4 files, because this has an significant impact on the performance of GSI parallel IO for fv3-lam and it could also impact the IO for FV3 model.

  1. For the fv3-lam forecast model , It was found the cold start files in netcdf 4 class would cause slower initialization part compared with runs using restart files in netcdf 4 (though the reasons are still to be identified).
    Hence , that will be very helpful if chgres_cube would be implemented with the function to generate files in netcdf 4 format . I also cc our colleagues directly involved in this issue Shun Liu , Ming Hu (@hu5970 ) and Eric Rodger . And they can clarify on this problem.
GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA I was able to write netcdf 4 files. There is a way to adjust the cache size, but I just used the defaults for now. How can we test this?

TingLei-NOAA commented 1 year ago

@GeorgeGayno-NOAA Great!. If you need me to test this new function, which branch should I use ?

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA Use this branch: https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4

TingLei-NOAA commented 1 year ago

George, Thank you! I also cc this to Ming ,Shun and others. We will keep you posted through this issue. Ting


Ting Lei

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA Use this branch: https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1259706386, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

TingLei-NOAA commented 1 year ago

Hi George,

The changes look good to me. There is no code to control the trunk size, right? The trunk size will be decided by the system default, right?

Thanks, Ming

On Tue, Sep 27, 2022 at 9:59 AM Ting Lei - NOAA Affiliate @.***> wrote:

George, Thank you! I also cc this to Ming ,Shun and others. We will keep you posted through this issue. Ting


Ting Lei

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA < @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA Use this branch: https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1259706386, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeGayno-NOAA commented 1 year ago

Hi George, The changes look good to me. There is no code to control the trunk size, right? The trunk size will be decided by the system default, right? Thanks, Ming On Tue, Sep 27, 2022 at 9:59 AM Ting Lei - NOAA Affiliate @.***> wrote: George, Thank you! I also cc this to Ming ,Shun and others. We will keep you posted through this issue. Ting

When using netcdf4 files, you can set the cache_size, cache_nelems and cach-preemption in the call to nf90_create. I am using the default values as I don't know how to set them for your process. Do you have an idea of how to set these arguments?

See https://docs.unidata.ucar.edu/netcdf-fortran/current/f90_datasets.html#f90-nf90_create

TingLei-NOAA commented 1 year ago

Hi, Gorge, Dusan had setup the chunk size in the fms lib as described in the excerpt from his email ^^

I made some changes to FMS to explicitly set the chunk sizes of each variable to be equal to its dimension lengths.

Please see my branch https://github.com/DusanJovic-NOAA/FMS/tree/chunks V He needed to define the chunksize for each variable as he did in the netcdf_io.F90 in FMS lib (one such code on hera is /scratch2/NCEPDEV/fv3-cam/Ting.Lei/dr-dusan//FMS/fms2_io/netcdf_io.F90, around line 928) ^^ if (present(dimensions)) then allocate(dimids(size(dimensions))) allocate(dimlens(size(dimensions))) do i = 1, size(dimids) dimids(i) = get_dimension_id(fileobj%ncid, trim(dimensions(i)),msg=append_error_msg) dimlens(i) = get_dimension_len(fileobj%ncid, dimids(i),msg=append_error_msg) enddo err = nf90_def_var(fileobj%ncid, trim(variable_name), vtype, dimids, varid, chunksizes=dimlens) deallocate(dimids) deallocate(dimlens) V In the above, the chunk for each var is defined to the whole block of this multi-dimentional array. This kind of setup give us the significant speeding up for GSI processing with them. Thank you! Ting


Ting Lei

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Tue, Sep 27, 2022 at 1:29 PM GeorgeGayno-NOAA @.***> wrote:

Hi George, The changes look good to me. There is no code to control the trunk size, right? The trunk size will be decided by the system default, right? Thanks, Ming On Tue, Sep 27, 2022 at 9:59 AM Ting Lei - NOAA Affiliate @.***> wrote: … <#m-4454820830217294034> George, Thank you! I also cc this to Ming ,Shun and others. We will keep you posted through this issue. Ting

When using netcdf4 files, you can set the cache_size, cache_nelems and cach-preemption in the call to nf90_create. I am using the default values as I don't know how to set them for your process. Do you have an idea of how to set these arguments?

See https://docs.unidata.ucar.edu/netcdf-fortran/current/f90_datasets.html#f90-nf90_create

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1259827725, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7HAEJ2EOITT6KN2F7DWAMVHBANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA I added chunking to the atmospheric file at 2038e2e. Can you please test my branch and check for performance improvements.

Here is a check (using ncdump -h -s) of one of the wind records. The chunk sizes are set to the length of each dimension:

float v_s(lev, latp, lon) ;
      v_s:coordinates = "geolon_s geolat_s" ;
      v_s:_Storage = "chunked" ;
      v_s:_ChunkSizes = 64, 97, 96 ;
      v_s:_Endianness = "little" ;
TingLei-NOAA commented 1 year ago

@GeorgeGayno-NOAA That is great! Just one question, how shall we define the parameters like i_target_out ? I didn't find out from your PR.

GeorgeGayno-NOAA commented 1 year ago

@GeorgeGayno-NOAA That is great! Just one question, how shall we define the parameters like i_target_out ? I didn't find out from your PR.

That is the 'i' dimension of the output grid. That is set by the user. My test used a C96 grid.

TingLei-NOAA commented 1 year ago

George, Thanks for your anwer. I think this is exactly what we need. What is your plan to push it to the main branch ? Or, we can use this branch for being now and wait and see when this change would be pushed into the main branch ? Thanks a lot for your further clarification. Ting


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Thu, Sep 29, 2022 at 3:13 PM GeorgeGayno-NOAA @.***> wrote:

@GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA That is great! Just one question, how shall we define the parameters like i_target_out ? I didn't find out from your PR.

That is the 'i' dimension of the output grid. That is set by the user. My test used a C96 grid.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1262706554, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7DAGAZVMFTKTMVXTS3WAXS3JANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA We need to prove that these changes improve performance before merging. And I only added chunking to the atmospheric file so far. The surface and lateral boundary files do not have chunking.

Come up with a testing strategy and I can help you.

TingLei-NOAA commented 1 year ago

George, Got it! I will set up a comparison case to show its' benefit for GSI as soon as possible and let you know. Than you! Ting


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA We need to prove that these changes improve performance before merging. And I only added chunking to the atmospheric file so far. The surface and lateral boundary files do not have chunking.

Come up with a testing strategy and I can help you.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1262739134, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

TingLei-NOAA commented 1 year ago

Ting,

Thank you for updating this. We will make sure to merge the changes to RRFS repo.

Shun

On Tue, Sep 27, 2022 at 11:59 AM Ting Lei - NOAA Affiliate < @.***> wrote:

George, Thank you! I also cc this to Ming ,Shun and others. We will keep you posted through this issue. Ting


Ting Lei

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Tue, Sep 27, 2022 at 11:53 AM GeorgeGayno-NOAA < @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA Use this branch: https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/netcdf4

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1259706386, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7D6CO5QGC3TR26LLNLWAMJ75ANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

TingLei-NOAA commented 1 year ago

Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km conus runs. But seems the namelist with it is not compatible with the new chgres_cube. Would you please point me to a one working for your branch? Thank you! Ting


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Thu, Sep 29, 2022 at 3:55 PM Ting Lei - NOAA Affiliate @.***> wrote:

George, Got it! I will set up a comparison case to show its' benefit for GSI as soon as possible and let you know. Than you! Ting


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA We need to prove that these changes improve performance before merging. And I only added chunking to the atmospheric file so far. The surface and lateral boundary files do not have chunking.

Come up with a testing strategy and I can help you.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1262739134, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeGayno-NOAA commented 1 year ago

Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km conus runs. But seems the namelist with it is not compatible with the new chgres_cube. Would you please point me to a one working for your branch? Thank you! Ting

What namelist error are you getting? Can I look at the log file and scripts?

hu5970 commented 1 year ago

Ting,

Please delete "fix_dir_input_grid" and try.

Thanks, Ming

On Tue, Oct 11, 2022 at 6:07 AM GeorgeGayno-NOAA @.***> wrote:

Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km conus runs. But seems the namelist with it is not compatible with the new chgrescube. Would you please point me to a one working for your branch? Thank you! Ting … <#m-4987553524089041708_> __ Ting Lei Physical Scientist, Contractor with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct., Cubicle 2765 College Park, MD 20740 @.** 301-683-3624 On Thu, Sep 29, 2022 at 3:55 PM Ting Lei - NOAA Affiliate @. > wrote: George, Got it! I will set up a comparison case to show its' benefit for GSI as soon as possible and let you know. Than you! Ting __ Ting Lei Physical Scientist, Contractor with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct., Cubicle 2765 College Park, MD 20740 @. 301-683-3624 On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA @.> wrote: > @TingLei-NOAA https://github.com/TingLei-NOAA https://github.com/TingLei-NOAA https://github.com/TingLei-NOAA We need to prove that > these changes improve performance before merging. And I only added chunking > to the atmospheric file so far. The surface and lateral boundary files do > not have chunking. > > Come up with a testing strategy and I can help you. > > — > Reply to this email directly, view it on GitHub > <#689 (comment) https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1262739134>,

or unsubscribe > https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA . > You are receiving this because you were mentioned.Message ID: > @.*>

What namelist error are you getting? Can I look at the log file and scripts?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1274579060, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVV23TOWAMK3KQMZIN6MG3WCVJ7JANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

TingLei-NOAA commented 1 year ago

George , I am re running jobs to finish a chgres-cube case . After it is finished, I will send the links you need if it still fails using Ming's fix. Ming, Thanks. I will see if it fix the problem. Regards, Ting


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Tue, Oct 11, 2022 at 11:14 AM Ming Hu @.***> wrote:

Ting,

Please delete "fix_dir_input_grid" and try.

Thanks, Ming

On Tue, Oct 11, 2022 at 6:07 AM GeorgeGayno-NOAA @.***> wrote:

Hi, George, I am trying to test the new chgres_cube in the GSL's RRFS 3km conus runs. But seems the namelist with it is not compatible with the new chgrescube. Would you please point me to a one working for your branch? Thank you! Ting … <#m-4987553524089041708_> __ Ting Lei Physical Scientist, Contractor with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct., Cubicle 2765 College Park, MD 20740 @.** 301-683-3624 On Thu, Sep 29, 2022 at 3:55 PM Ting Lei - NOAA Affiliate @. > wrote: George, Got it! I will set up a comparison case to show its' benefit for GSI as soon as possible and let you know. Than you! Ting __ Ting Lei Physical Scientist, Contractor with Lynker in support of EMC/NCEP/NWS/NOAA 5830 University Research Ct., Cubicle 2765 College Park, MD 20740 @. 301-683-3624 On Thu, Sep 29, 2022 at 3:48 PM GeorgeGayno-NOAA @.> wrote: > @TingLei-NOAA https://github.com/TingLei-NOAA https://github.com/TingLei-NOAA https://github.com/TingLei-NOAA We need to prove that > these changes improve performance before merging. And I only added chunking > to the atmospheric file so far. The surface and lateral boundary files do > not have chunking. > > Come up with a testing strategy and I can help you. >

— > Reply to this email directly, view it on GitHub > <#689 (comment) < https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1262739134

, or unsubscribe >

https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA < https://github.com/notifications/unsubscribe-auth/APEFS7ATPWNG3KYV5PRWYQLWAXXBHANCNFSM576NGXIA

. > You are receiving this because you were mentioned.Message ID: > @.*>

What namelist error are you getting? Can I look at the log file and scripts?

— Reply to this email directly, view it on GitHub < https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1274579060 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABVV23TOWAMK3KQMZIN6MG3WCVJ7JANCNFSM576NGXIA

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1274858940, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7DQ2MXN4BGPE2HXDKLWCV735ANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

TingLei-NOAA commented 1 year ago

@GeorgeGayno-NOAA A update: (@hu5970 's suggestion does work (the chgres_cube could be run smoothly in the workflow. Thanks). I tested two versions of yours 1) The first change (the output is changed to netcdf 4 from netcdf 4 classic , but no i_target_out parameters for for chunksize
2) The second, on top of 1, set the chunk according to the variable's 3d dimension. The above is the findings: For 1, in the generated gfs_data* file, the layout is as ^^ float t(lev, lat, lon) ; t:coordinates = "geolon geolat" ; t:_Storage = "contiguous" ; t:_Endianness = "little" ; VV When this file was used for GSI analysis, the wallclock time is as ^^

*RESOURCE STATISTICS*** The total amount of wall time = 308.359866 The total amount of time in user mode = 222.475745 The total amount of time in sys mode = 15.427660 The maximum resident set size (KB) = 1759124 Number of page faults without I/O activity = 682758 Number of page faults with I/O activity = 27 Number of times filesystem performed INPUT = 1719232 Number of times filesystem performed OUTPUT = 118512 Number of Voluntary Context Switches = 196078 Number of InVoluntary Context Switches = 379 *END OF RESOURCE STATISTICS*** V For the second change: The layout in the gfs_data file is as ^^ float t(lev, lat, lon) ; t:coordinates = "geolon geolat" ; t:_Storage = "chunked" ; t:_ChunkSizes = 66, 1092, 1820 ; t:_Endianness = "little" ; V The GSI using this gfs_data file as the input gives the wall clock time as ^ *RESOURCE STATISTICS*** The total amount of wall time = 320.800557 The total amount of time in user mode = 225.983825 The total amount of time in sys mode = 15.965445 The maximum resident set size (KB) = 1757124 Number of page faults without I/O activity = 680563 Number of page faults with I/O activity = 19 Number of times filesystem performed INPUT = 1738576 Number of times filesystem performed OUTPUT = 118512 Number of Voluntary Context Switches = 195925 Number of InVoluntary Context Switches = 382 V In summary, to my surprise, the specific chunk setup in the second according to the variable's 3d shape doesn't show benefits compared with the "continuous " storage in the first case in terms of the GSI 's performance using them as the background files. Please let me know if any further tests you need. Thank you so much for your developments demonstrated in this issue. Ting

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA So, should we close this issue? Should we update chgres to output netcdf4 (without adding chunking)?

edwardhartnett commented 1 year ago

Contiguous storage will generally be faster than chunked storage to write, but does not allow compression. Are you sure you don't want compression?

Switching from netCDF-4 to netCDF-4 classic and back will have no impact on performance. The only difference is that the netCDF-4 classic format will not allow you to create anything from the enhanced data model - that is, a netcdf-4 classic file is a netCDF-4/HDF5 file that does not use any of the new types, or multiple unlimited dimensions. It uses only the classic model of netCDF.

On Fri, Oct 14, 2022 at 2:42 PM GeorgeGayno-NOAA @.***> wrote:

@TingLei-NOAA https://github.com/TingLei-NOAA So, should we close this issue? Should we update chgres to output netcdf4 (without adding chunking)?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1278958367, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJIOMMCF277EVAWSMPGTEKLWDFILVANCNFSM576NGXIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TingLei-NOAA commented 1 year ago

@edwardhartnett Thanks a lot for your comments/explanation. We need netcdf 4 because the current GSI parallel IO for fv3-lam is the parallelization based on hdf 5, namely, only working for netcdf 4. But if we need compression in the near future, I have no idea. @hu5970 and Shun Liu could be a better position to give the answer. @GeorgeGayno-NOAA , from the purpose of this issue, yes, this issue can be closed. The decision on whether to include the second change (chunk size setup ) is up to you with Ed's information. Thanks!

GeorgeGayno-NOAA commented 1 year ago

@edwardhartnett Thanks a lot for your comments/explanation. We need netcdf 4 because the current GSI parallel IO for fv3-lam is the parallelization based on hdf 5, namely, only working for netcdf 4. But if we need compression in the near future, I have no idea. @hu5970 and Shun Liu could be a better position to give the answer. @GeorgeGayno-NOAA , from the purpose of this issue, yes, this issue can be closed. The decision on whether to include the second change (chunk size setup ) is up to you with Ed's information. Thanks!

@TingLei-NOAA You said "we need netcdf 4" for the GSI. If I close this issue, chgres will continue to output "netcdf4 classic". Is that what you want?

TingLei-NOAA commented 1 year ago

@George Gayno - NOAA Affiliate @.***> sorry for the confusion I caused. Your first change (make the output to be netcdf 4 with continuous storage is definitely what we need. Thanks.


Ting Lei

Physical Scientist, Contractor with Lynker in support of

EMC/NCEP/NWS/NOAA

5830 University Research Ct., Cubicle 2765

College Park, MD 20740

@.***

301-683-3624

On Mon, Oct 17, 2022 at 2:21 PM GeorgeGayno-NOAA @.***> wrote:

@edwardhartnett https://github.com/edwardhartnett Thanks a lot for your comments/explanation. We need netcdf 4 because the current GSI parallel IO for fv3-lam is the parallelization based on hdf 5, namely, only working for netcdf 4. But if we need compression in the near future, I have no idea. @hu5970 https://github.com/hu5970 and Shun Liu could be a better position to give the answer. @GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA , from the purpose of this issue, yes, this issue can be closed. The decision on whether to include the second change (chunk size setup ) is up to you with Ed's information. Thanks!

@TingLei-NOAA https://github.com/TingLei-NOAA You said "we need netcdf 4" for the GSI. If I close this issue, chgres will continue to output "netcdf4 classic". Is that what you want?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/UFS_UTILS/issues/689#issuecomment-1281293718, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEFS7HDHFB2QRVSJAWDO7DWDWKLRANCNFSM576NGXIA . You are receiving this because you were mentioned.Message ID: @.***>

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA Ok. We can always revisit chunking later.

GeorgeGayno-NOAA commented 1 year ago

Completed the updates to add chunking to all output files - atmosphere, surface and LBC - at 9e662d7. Will keep this branch in my fork in case we want to revisit chunking in the future.

Opened a new branch and modified the nf90_create calls to output netcdf4 - 05b976e.

GeorgeGayno-NOAA commented 1 year ago

Compiled the branch at 05b976e on Cactus and ran the consistency tests. All tests failed. None of the data values were different. Only the global attributes. Example:

DIFFER : FILE FORMATS : NC_FORMAT_NETCDF4 <> NC_FORMAT_NETCDF4_CLASSIC

GeorgeGayno-NOAA commented 1 year ago

Compiled the branch at 05b976e on Hera, Jet and Orion. All tests failed as they did on Cactus. This is the expected result. Will submit a PR.

GeorgeGayno-NOAA commented 1 year ago

@TingLei-NOAA Please review the PR - #704.

edwardhartnett commented 1 year ago

OK, I'm not sure if this is relevant but there is some confusion here about netCDF-4 vs. netCDF-4 classic.

A netCDF-4 classic file is a netCDF-4 file that adheres to the netCDF classic data model. A netCDF-4 classic file is a netCDF-4 file and can easily be read by any netCDF-4 application.

So you do not need to turn off the NC_CLASSIC_MODEL flag in order to improve performance. If you take away that flag, netCDF will allow you to create elements of the enhanced model in the file. For example, in a file with NC_CLASSIC_MODEL, there can only be one unlimited dimension. If you try and create a second unlimited dimension, you will get an error.

But create a file without NC_CLASSIC_MODEL and you will be able to create as many unlimited dimensions as you want.

In both cases, a netCDF4/HDF5 file results, and both files can be read by any netCDF program.

So the difference between classic and not is simply that classic files restrict what you can add to the file, in a way that exactly matches the behavior of netCDF classic. Without the NC_CLASSIC_MODEL, netCDF allows you to use features that were not present in classic netCDF, including multiple dimensions, user-defined types, and unsigned integer types.

But there is no performance difference between netCDF-4 files created with or without NC_CLASSIC_MODEL.

TingLei-NOAA commented 1 year ago

@edwardhartnett Thanks a lot for your information, very helpful! As stated in the issue, the FV3-Lam GSI needs netcdf 4 because its' current parallel IO is based on parallel netcdf 4, which doesn't support treating of netcdf classic.
Thanks