noaa-ocs-modeling / EnsemblePerturbation

perturbation of coupled model input over a space of input variables
https://ensembleperturbation.readthedocs.io
Creative Commons Zero v1.0 Universal
7 stars 3 forks source link

Using HAFS ensemble in SCHISM ensemble workflow #112

Closed SorooshMani-NOAA closed 3 months ago

SorooshMani-NOAA commented 9 months ago

This ticket tracks discussion about how to use the HAFS results for SCHISM workflow. Relevant tickets:

SorooshMani-NOAA commented 9 months ago

I've downloaded one day of HAFS results during Idalia, and I'm trying to figure out how the required data for sflux files can be extracted from it.

SorooshMani-NOAA commented 9 months ago

I wrote a basic script for translating HAFS grb2 files into sflux format for SCHISM. I needs some more information about how dates and ensemble members are organized in the HAFS resutls before I can proceed further and set up a run (or an ensemble of runs) I emailed my questions to Zhan and Jiayi.

SorooshMani-NOAA commented 9 months ago

More info as I'm working through HAFS to sflux:

From @yichengt900:

Here is my old matlab script to convert gdas grb2nc for schism: https://github.com/schism-dev/schism/blob/master/src/Utility/Sflux_nc/grb2nc.m. It has been a while, but if my memory serves me right, the date doesn't necessarily have to be based on 2001; you just need to provide the correct time unit attribute.

And about number of times per file:

I think it is 1000 based on the code: max_file_times = 1000 !max. # of time records with each file

In the current form of the script, all the attributes from original HAFS data arrays are kept.

SorooshMani-NOAA commented 9 months ago
from argparse import ArgumentParser
from pathlib import Path
from datetime import datetime, timedelta

import numpy as np
import xarray as xr

CYCLE = 6 # hr

def check_files_exist(path, filenames):

    files_in_path = {str(p.name) for p in path.iterdir()}
    missing = []
    for fnm in filenames:
        if fnm not in files_in_path:
            missing.append(fnm)

    if len(missing) > 0:
        raise IOError(f"Cannot find all required datafiles! Missing: <{missing}>!")

def snap_to_previous_cycle(date):
    snapped_date = datetime(
        date.year, date.month, date.day, tzinfo=date.tzinfo
    ) + timedelta(hours=np.floor(date.hour/6) * 6)

    return snapped_date

def check_data_availability(path, start, end, fcst_at=None):

    if fcst_at is None:
        fcst_at = start

    fcst_at_cyc = snap_to_previous_cycle(fcst_at)

    filenames = []
    date = start
    while date <= fcst_at_cyc:
        yyyymmddhh = date.strftime('%Y%m%d%H')
        filenames.append(f"00l.{yyyymmddhh}.hfsa.parent.atm.f000.grb2")

    yyyymmddhh = fcst_at_cyc.strftime('%Y%m%d%H')
    last_fcst_hr = np.ceil((end - fcst_at_cyc).total_seconds() / 3600)
    hours = np.arange(last_fcst_hr + 1)
    filenames.extend(
        [
            f"00l.{yyyymmddhh}.hfsa.parent.atm.f{hr:03}.grb2"
            for hr in hours
        ]
    )

    check_files_exist(path, filenames)

def hafs_to_sflux(path):
    ds_msl = xr.open_dataset(
        path,
        filter_by_keys={
            'typeOfLevel': 'meanSea',
        }
    )
    ds_msl = ds_msl.drop_vars('meanSea')

    ds_2ag = xr.open_dataset(
        path,
        filter_by_keys={
            'typeOfLevel': 'heightAboveGround',
            'topLevel': 2,
        }
    )
    ds_2ag = ds_2ag.drop_vars('heightAboveGround')

    ds_10ag = xr.open_dataset(
        path,
        filter_by_keys={
            'typeOfLevel': 'heightAboveGround',
            'topLevel': 10,
        }
    )
    ds_10ag = ds_10ag.drop_vars('heightAboveGround')

    mgrid = np.meshgrid(ds_msl.longitude, ds_msl.latitude)

    data = {
        'lon': (('latitude', 'longitude'), mgrid[0], ds_msl.longitude.attrs),
        'lat': (('latitude', 'longitude'), mgrid[1], ds_msl.latitude.attrs),
        'uwind': ds_10ag.u10.expand_dims('valid_time'),
        'vwind': ds_10ag.v10.expand_dims('valid_time'),
        'prmsl': ds_msl.prmsl.expand_dims('valid_time'),
        'spfh': ds_2ag.sh2.expand_dims('valid_time'),
        'stmp': ds_2ag.t2m.expand_dims('valid_time'),
    }

    new_ds = xr.Dataset(data_vars=data)
    new_ds = new_ds.swap_dims(
        {'longitude': 'nx_grid', 'latitude': 'ny_grid'}
    ).reset_coords(
        ['longitude', 'latitude', 'step', 'time'],
        drop=True
    ).rename(
        {'valid_time': 'time'}
    )

    return new_ds

def all_hafs_to_sflux(paths):
    all_sflux = []
    for path in paths:
        all_sflux.append(hafs_to_sflux(path))

    # TODO: Limit to 1000 (MAX FILE TIME)
    combined_sflux = xr.combine_by_coords(
        all_sflux,
        data_vars='different',
    )
    return combined_sflux

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--out', '-o', type=Path, default='.')
    parser.add_argument('files', nargs='+', type=Path)

    args = parser.parse_args()

    ds = all_hafs_to_sflux(args.files)
    ds.to_netcdf(args.out / 'sflux_air_1.0001.nc')
SorooshMani-NOAA commented 8 months ago

For now the plan is to setup SCHISM with HAFS data (separate from the NHC ensemble effort). @saeed-moghimi-noaa and I need to have a follow up short meeting about this to discuss how the data must be used (single cycle multiple forecast vs multiple cycles first forecast)

SorooshMani-NOAA commented 7 months ago

A script is developed to read the grib files from their cloud location on s3 and write sflux files for each ensemble (combined through time dim). The logic is to get forecast 000 of member 00 for all of the cycles prior to the perturbation, then after perturbation, get all the forecast hours for a single cycle for each ensemble member. After some preliminary testing, I need to use these generated sfluxes to setup an ensemble using coupledmodeldriver, or if it requires too much development, for now just to use them to manual setup an ensemble!

SorooshMani-NOAA commented 6 months ago

I updated coupledmodeldriver to be able to setup SCHISM with user-provided sflux files. Doing this enables us to use ensemble perturbation to write out the whole ensembled. The single model setup using coupledmodeldriver was successful (pending SCHISM run test). I'm also testing generating the full ensemble of HAFS using this update (in a separate branch right now)

On a separate note I realized that prate, dswrf and dlwrf are missing from the sflux I created. I'm need to add those as well! These variables can be found in the GRIB2 file at the layer read by:

ds = xr.open_dataset(
    '/path/to/00l.YYYMMDDHH.hfsa.parent.atm.fCYC.grb2',
    engine='cfgrib', 
    filter_by_keys={'typeOfLevel': 'surface', 'stepType': 'avg'}
)

Note that dswrf and dlwrf are also present at the layer with {'typeOfLevel': 'surface', 'stepType': 'instant'}. I'm still not sure which one to use for radiation.

SorooshMani-NOAA commented 6 months ago

The generation of sflux setup in both single and ensemble modes were successful (pending run tests). The updated HAFS data extraction code with the addition of precipitation and radiation was also successfully tested (pending generated flux inspection).

I also need to address memory usage issue for the HAFS extraction code. Potential solution at https://stackoverflow.com/questions/73287939/threadpoolexecutor-threads-futures-do-not-release-memory-when-completed-and-t

The current code can either use local filesystem, or pull filesystems from the S3 bucket. Since HAFS is going to move their data, we might need to address downloading data from another location as well later.

Update After further investigation, there doesn't seem to be a memory issue in the HAFS-to-Sflux code after all! The next step is to test running the generated ensembles, and then try to run a full length storm ensemble.

saeed-moghimi-noaa commented 6 months ago

Thanks @SorooshMani-NOAA

SorooshMani-NOAA commented 6 months ago

Trying to run SCHISM for a single setup case, I ran into issues with the generated sflux file. I will add more details as I learn more.

SorooshMani-NOAA commented 6 months ago

Note that using convert_calendar when combining all datasets to create a single sflux file is necessary. This ensures that the calendar is compatible with the pyschism requirement; e.g. it should be gregorian, proleptic_gregorian is not supported.

We can escalate this issue to fix pyschism SfluxDataset to support other calendars as well.

SorooshMani-NOAA commented 6 months ago

Even after fixing the date issue, the single run is still failing in the first time step with an error:

 23: ABORT:  QUICKSEARCH: Cannot find a vert. level:  -249.56895039521086        6.9526567718092286E-310   6.9526567718102167E-310  -258.64307520892299                             NaN
SorooshMani-NOAA commented 6 months ago

The unperturbed version of the run (single run from HAFS sflux) went through. I haven't looked at the results yet!

SorooshMani-NOAA commented 6 months ago

In order to test an ensemble run I need to wait for the HAFS storage ticket to be addressed first.

saeed-moghimi-noaa commented 6 months ago

Good progress @SorooshMani-NOAA . Thanks for the update!

SorooshMani-NOAA commented 4 months ago

The HAFS dataset seems to be on NODD now, I'll need to check the files there to see if I can setup a run or not.

https://noaa-nws-hafs-pds.s3.amazonaws.com/index.html

SorooshMani-NOAA commented 4 months ago

I'm running into issue running locally. I'll try again on a PW node

SorooshMani-NOAA commented 4 months ago

I just noticed that for Idalia, ensemble 26 doesn't have any forecast on 20230829, see:

aws s3 ls --no-sign-request s3://noaa-nws-hafs-pds/herc/26/2023082900/

there's no grib files.

SorooshMani-NOAA commented 3 months ago

I created a test run from 08/28/2023 to 08/30/2023 with perturbation at 08/29/2023. That means that from 28th to 29th the sflux values come from the 00 grib file at fcst000 on the given cycle datetime, and for the rest of the times until the end each sflux value comes from it's ensemble member grib file 01-30 on 08/29/2023 for different fcst???. Noting that 26 doesn't have any data for forecast at the given perturbation.

The aforementioned ensemble is running on Hercules right now. After it's success I'll create another ensemble from 08/26 to 09/02 with perturbation at 08/28

SorooshMani-NOAA commented 3 months ago

I made the same mistake with nans https://github.com/schism-dev/schism/issues/121. I'm writing a script to take care of fixing sflux using the provided hgrid.

SorooshMani-NOAA commented 3 months ago

Another update: If I use 00th hour of the aforementioned days for start and end of the simulation both to fetch the data and to setup the simulation, the run fails due to an additional timestep of data missing from sflux. I'm not sure if I have a cut-off error somewhere, or if SCHISM requires that sflux have additional timesteps after end time.

Image

Image

Image

SorooshMani-NOAA commented 3 months ago

All but one of the runs for 08/26 to 09/02, with perturbation at 08/28 went through successfully.

Note after setting up the model using ensembleperturbation with the sflux file addition, I manually updated iof_hydro(2) and iof_hydro(14) for all the runs to be 1. Other than that I set the rnday to be 6.75 to avoid the issue at last time step: https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/112#issuecomment-2020624792

I also noticed that ensemble 24 doesn't have past hour 78 of the forecast for cycle 20230828: https://noaa-nws-hafs-pds.s3.amazonaws.com/index.html#herc/24/2023082800/

I need to now validate the run and then figure out how to generate the prob fields using IDL code.

SorooshMani-NOAA commented 3 months ago

Looking at sfluxes for validation, it seems that they are generated correctly, I get storm field as I expected, and there's 0 difference until 08/28, and then there's some divergance:

Landfall of 00 member Difference at 00 and 05 08/28 Difference of 00 and 05 at 08/30T6
Image Image Image
SorooshMani-NOAA commented 3 months ago

Although the sflux files look OK, the results from SCHISM doesn't look right, I need to check with VIMS team to see how to debug this:

sflux velocity field sflux pressure field schism pressure Colorbar
Image Image Image Image
SorooshMani-NOAA commented 3 months ago

Another strange point is the range I get for pressure from schism out2d:

Image

SorooshMani-NOAA commented 3 months ago

Fei suggested subtracting 360 from the longitudes in the generated sflux files, I'll try that and report back

SorooshMani-NOAA commented 3 months ago

Adjusting the longitude to be -180 to 180 resolved the issue!

Image

Now I need to find out how to post process the results!

saeed-moghimi-noaa commented 3 months ago

@feiye-vims

Hi Fei,

Would you please look at this issue.

Thanks, -Saeed

feiye-vims commented 3 months ago

Hi Saeed,

I looked at the issue last week and it has been resolved.

Best, Fei

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Saeed Moghimi @.> Sent: Monday, April 15, 2024 6:45:48 PM To: noaa-ocs-modeling/EnsemblePerturbation @.> Cc: Fei Ye @.>; Mention @.> Subject: Re: [noaa-ocs-modeling/EnsemblePerturbation] Using HAFS ensemble in SCHISM ensemble workflow (Issue #112)

[EXTERNAL to VIMS received message]

@feiye-vimshttps://github.com/feiye-vims

Hi Fei,

Would you please look at this issue.

Thanks, -Saeed

— Reply to this email directly, view it on GitHubhttps://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/112#issuecomment-2057937099, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFFRLZ63WRILH2ATJCT7SVDY5RKBZAVCNFSM6AAAAAA52RTY3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXHEZTOMBZHE. You are receiving this because you were mentioned.Message ID: @.***>