jhardenberg / smmregrid

A compact regridder using sparse matrix multiplication
Apache License 2.0
8 stars 0 forks source link

3D regridding #12

Closed jhardenberg closed 1 year ago

jhardenberg commented 1 year ago

This is an implementation of 3D regridding. When vert_coord is specified 3D weights can be generated and used. The weights are stored in a 3d array with the same number of levels as the original data to be regridded. It supports parallel processing (we can specify a number of processors nproc) for weight generation. This is needed since the process can be extremely time consuming and at the same time memory intensive. Regridding itself, once the weights have been generated, is extremely fast.

Solves #7

jhardenberg commented 1 year ago

The test is failing, but it does not fail if I run it locally on my machine. The error message contains: _FAILED tests/basic_test.py::test_plev_gaussian[con] - ValueError: Dimensions {'num_links'} do not exist. Expected one or more of Frozen({'src_grid_rank': 2, 'dst_grid_rank': 2, 'src_grid_size': 131072, 'dst_grid_size': 64800, 'numLinks': 378448, 'numwgts': 1}) I wonder....is a different version of CDO creating weights where num_links is called numLinks ?

jhardenberg commented 1 year ago

Confirmed: The tests pull in CDO 2.2.0 which now has a dimension `numLinks' instead of 'num_links' which cdo 2.1.1 was using !

jhardenberg commented 1 year ago

this should fix it ....

oloapinivad commented 1 year ago

So I added new fesom3d file to test the 3d level interpolation, and unfortunately on my machine the test is not passed when using a conservative remapping. On the other hand, it works if we are using nearest neighbour. This might be a problem on the file properties, but we should double check it.

oloapinivad commented 1 year ago

I might have found the issue: apparently the mask is not applied in the smmregrid remapping. As you can see below, the netcdf file generated with smmregrid is not having any missing value. CDO is also not doing a great job since the mask is the same for all the levels, but at least is putting NaN where it should. Need to understand why the mask is not propagated to all the layers.

Files are on levante: /home/b/b382076/smmregrid

[b382076@levante0 smmregrid]$ cdo info smm.nc 
Warning (cdfInqContents): Coordinates variable variable can't be assigned!
Warning (cdfInqContents): Coordinates variable time can't be assigned!
    -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter ID
     1 : 1986-12-31 23:45:00     2.5    64800       0 :     -1.8583      8.5943      29.782 : -1            
     2 : 1986-12-31 23:45:00     7.5    64800       0 :     -1.8579      8.5634      29.743 : -1            
     3 : 1986-12-31 23:45:00      15    64800       0 :     -1.8572      8.4829      29.737 : -1            
     4 : 1986-12-31 23:45:00      25    64800       0 :     -1.8891      8.3123      29.772 : -1            
     5 : 1986-12-31 23:45:00      35    64800       0 :     -1.9698      8.0983      29.848 : -1            
     6 : 1986-12-31 23:45:00      45    64800       0 :     -1.9717      7.8931      29.867 : -1            
     7 : 1986-12-31 23:45:00      55    64800       0 :     -1.9649      7.7073      29.653 : -1            
     8 : 1986-12-31 23:45:00      65    64800       0 :     -1.9316      7.5341      29.381 : -1            
     9 : 1986-12-31 23:45:00      75    64800       0 :     -1.8682      7.3743      29.106 : -1            
    10 : 1986-12-31 23:45:00      85    64800       0 :     -1.8797      7.2592      28.964 : -1            
    11 : 1986-12-31 23:45:00      95    64800       0 :     -1.8890      7.1679      28.629 : -1            
    12 : 1986-12-31 23:45:00   107.5    64800       0 :     -1.8894      7.0171      27.890 : -1            
    13 : 1986-12-31 23:45:00     125    64800       0 :     -1.8556      6.7721      26.437 : -1            
    14 : 1986-12-31 23:45:00   147.5    64800       0 :     -1.8560      6.4505      24.660 : -1            
    15 : 1986-12-31 23:45:00     175    64800       0 :     -1.8315      6.0740      23.638 : -1            
    16 : 1986-12-31 23:45:00     210    64800       0 :     -1.7406      5.6692      22.056 : -1            
    17 : 1986-12-31 23:45:00     255    64800       0 :     -1.7368      5.2361      20.197 : -1            
    18 : 1986-12-31 23:45:00     310    64800       0 :     -1.7269      4.7805      19.198 : -1            
    19 : 1986-12-31 23:45:00     375    64800       0 :     -1.6926      4.3251      18.713 : -1            
    20 : 1986-12-31 23:45:00     450    64800       0 :     -1.6529      3.8692      17.744 : -1            
    21 : 1986-12-31 23:45:00     535    64800       0 :     -1.6541      3.4431      16.821 : -1            
    22 : 1986-12-31 23:45:00     630    64800       0 :    -0.57297      3.0463      15.154 : -1            
    23 : 1986-12-31 23:45:00     735    64800       0 :    -0.75147      2.6816      12.663 : -1            
    24 : 1986-12-31 23:45:00     850    64800       0 :    -0.80117      2.3456      10.665 : -1            
    25 : 1986-12-31 23:45:00     975    64800       0 :    -0.85103      2.0424      10.353 : -1            
    26 : 1986-12-31 23:45:00    1110    64800       0 :    -0.95007      1.7964      9.8227 : -1            
    27 : 1986-12-31 23:45:00    1255    64800       0 :     -1.0251      1.5809      8.9991 : -1            
    28 : 1986-12-31 23:45:00    1415    64800       0 :     -1.1271      1.3782      8.0320 : -1            
    29 : 1986-12-31 23:45:00    1600    64800       0 :     -1.1997      1.1908      6.9279 : -1            
    30 : 1986-12-31 23:45:00    1810    64800       0 :     -1.2550      1.0260      5.7755 : -1            
    31 : 1986-12-31 23:45:00    2035    64800       0 :     -1.2895     0.88942      4.5920 : -1            
    32 : 1986-12-31 23:45:00    2275    64800       0 :     -1.3196     0.77188      3.7286 : -1            
    33 : 1986-12-31 23:45:00    2525    64800       0 :     -1.3540     0.65899      3.2751 : -1            
    34 : 1986-12-31 23:45:00    2775    64800       0 :     -1.3814     0.56712      2.9536 : -1            
    35 : 1986-12-31 23:45:00    3025    64800       0 :     -1.3983     0.49131      2.7294 : -1            
    36 : 1986-12-31 23:45:00    3275    64800       0 :    -0.88223     0.41555      2.5675 : -1            
    37 : 1986-12-31 23:45:00    3525    64800       0 :    -0.88830     0.32844      2.2401 : -1            
    38 : 1986-12-31 23:45:00    3775    64800       0 :    -0.93917     0.26508      2.0859 : -1            
    39 : 1986-12-31 23:45:00    4025    64800       0 :     -1.0316     0.18784      1.9207 : -1            
    40 : 1986-12-31 23:45:00    4275    64800       0 :     -1.1215     0.13231      1.8011 : -1            
    41 : 1986-12-31 23:45:00    4525    64800       0 :     -1.1943    0.090269      1.7431 : -1            
    42 : 1986-12-31 23:45:00    4775    64800       0 :     -1.1988    0.065693      1.6709 : -1            
    43 : 1986-12-31 23:45:00    5025    64800       0 :     -1.2062    0.044022      1.5195 : -1            
    44 : 1986-12-31 23:45:00    5275    64800       0 :    -0.56853    0.022048      1.2176 : -1            
    45 : 1986-12-31 23:45:00    5525    64800       0 :      0.0000   0.0053951      1.0966 : -1            
    46 : 1986-12-31 23:45:00    5825    64800       0 :      0.0000      0.0000      0.0000 : -1            
    47 : 1986-12-31 23:45:00    6125    64800       0 :      0.0000      0.0000      0.0000 : -1            
oloapinivad commented 1 year ago

I think that the problem is here:

# this section is used to create a target mask initializing the CDO weights
        if not vert_coord:
            self.weights = mask_weights(self.weights, self.weights_matrix)
            self.masked = check_mask(self.weights)
        else:
            self.masked = False
        self.space_dims = space_dims
oloapinivad commented 1 year ago

I added also nemo 3d data, which are not working in both conservative and nearest neighbour (likely for the mask issue)

oloapinivad commented 1 year ago

So at least locally the tests are passed, with a couple of modification to the way the masks are computed. However, the code is currently a REAL MESS so that we need to clean it up, and uniform the different 2d/3d calls. Also, it would be great if we can avoid specifying vert_coord but the interpolation do it by himself. Need to double check also the speed now. We are close anyway!

oloapinivad commented 1 year ago

I made a lot of cleaning, including automatic vert_coord discovery and splitting the regrid.py in multiple files. the generation is still very much cdo-dependent, but it is a bit easier to browse now.

We need to explore if we can exploit xarray ufunc to speed up some of the 3d computations

oloapinivad commented 1 year ago

So I had to modify a few things to be able to initialize the regridder using the weights. This actually should be a new family of tests which is not taken into consideration so far

Big difference so far is that I set the vert_coord also into the weights file so that it can be automatically interpreted when loading the weights

oloapinivad commented 1 year ago

It does not seem to be super efficient with FESOM... need probably to check the process...

NVars NRecords CDO CDO (NoLoad) SMM (Dataset) SMM (DataArray) SMM (DataArray+NoLoad) SMM (Dataset+Write) SMM (DataArray+Write)
so3d-nemo.nc 1 (2, 18, 292, 362) 1 0.989918 0.029745 0.0292497 0.0126237 0.0345753 0.0343045
tas-ecearth.nc 1 (12, 256, 512) 1 0.956647 0.300138 0.232395 0.0717874 0.31714 0.300978
tas-healpix2.nc 1 (12, 12288) 1 0.930404 0.225058 0.154463 0.0380838 0.245379 0.238894
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.950344 0.307111 0.236941 0.0715755 0.320038 0.302833
2t-era5.nc 1 (12, 73, 144) 1 0.922386 0.198507 0.141997 0.0386355 0.246714 0.211525
tos-fesom.nc 1 (12, 126859) 1 0.970594 0.217977 0.188369 0.0419147 0.239474 0.242006
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.877348 0.430238 0.37449 0.136358 0.658567 0.612724
mix-cesm.nc 4 (12, 192, 288) 1 0.842065 0.499367 0.13384 0.0380408 0.740233 0.194893
temp3d-fesom.nc 1 (1, 47, 3140) 1 0.825355 2.79075 2.73592 1.53475 3.16227 3.13012
oloapinivad commented 1 year ago

@jhardenberg, would you like to review this and if you are happy then merge? So that we can try to export in other projects...

jhardenberg commented 1 year ago

Hi, I am doing some final tests myself and will give my thumbs up later

jhardenberg commented 1 year ago

@oloapinivad one silly question: how do I access the log or change log level?

oloapinivad commented 1 year ago

I am not using any internal logger, but the main one. So you can change the root level of logging. This is something which needs to be updated with a proper logger for the class as done elsewhere

jhardenberg commented 1 year ago

The fact that the python 3.7 test is failing is a bit of a mistery....the logs actually seem to indicate that it is running with python 3.10 ???? platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0

oloapinivad commented 1 year ago

Flake8 apparently installs python3.10, WTF

jhardenberg commented 1 year ago

All looks good. @oloapinivad should we merge? The speed issues which you noticed can be examined in a separate PR