Closed SorooshMani-NOAA closed 3 months ago
I've downloaded one day of HAFS results during Idalia, and I'm trying to figure out how the required data for sflux
files can be extracted from it.
I wrote a basic script for translating HAFS grb2
files into sflux
format for SCHISM. I needs some more information about how dates and ensemble members are organized in the HAFS resutls before I can proceed further and set up a run (or an ensemble of runs) I emailed my questions to Zhan and Jiayi.
More info as I'm working through HAFS to sflux:
From @yichengt900:
Here is my old matlab script to convert gdas grb2nc for schism: https://github.com/schism-dev/schism/blob/master/src/Utility/Sflux_nc/grb2nc.m. It has been a while, but if my memory serves me right, the date doesn't necessarily have to be based on 2001; you just need to provide the correct time unit attribute.
And about number of times per file:
I think it is 1000 based on the code: max_file_times = 1000 !max. # of time records with each file
In the current form of the script, all the attributes from original HAFS data arrays are kept.
from argparse import ArgumentParser
from pathlib import Path
from datetime import datetime, timedelta
import numpy as np
import xarray as xr
CYCLE = 6 # hr
def check_files_exist(path, filenames):
files_in_path = {str(p.name) for p in path.iterdir()}
missing = []
for fnm in filenames:
if fnm not in files_in_path:
missing.append(fnm)
if len(missing) > 0:
raise IOError(f"Cannot find all required datafiles! Missing: <{missing}>!")
def snap_to_previous_cycle(date):
snapped_date = datetime(
date.year, date.month, date.day, tzinfo=date.tzinfo
) + timedelta(hours=np.floor(date.hour/6) * 6)
return snapped_date
def check_data_availability(path, start, end, fcst_at=None):
if fcst_at is None:
fcst_at = start
fcst_at_cyc = snap_to_previous_cycle(fcst_at)
filenames = []
date = start
while date <= fcst_at_cyc:
yyyymmddhh = date.strftime('%Y%m%d%H')
filenames.append(f"00l.{yyyymmddhh}.hfsa.parent.atm.f000.grb2")
yyyymmddhh = fcst_at_cyc.strftime('%Y%m%d%H')
last_fcst_hr = np.ceil((end - fcst_at_cyc).total_seconds() / 3600)
hours = np.arange(last_fcst_hr + 1)
filenames.extend(
[
f"00l.{yyyymmddhh}.hfsa.parent.atm.f{hr:03}.grb2"
for hr in hours
]
)
check_files_exist(path, filenames)
def hafs_to_sflux(path):
ds_msl = xr.open_dataset(
path,
filter_by_keys={
'typeOfLevel': 'meanSea',
}
)
ds_msl = ds_msl.drop_vars('meanSea')
ds_2ag = xr.open_dataset(
path,
filter_by_keys={
'typeOfLevel': 'heightAboveGround',
'topLevel': 2,
}
)
ds_2ag = ds_2ag.drop_vars('heightAboveGround')
ds_10ag = xr.open_dataset(
path,
filter_by_keys={
'typeOfLevel': 'heightAboveGround',
'topLevel': 10,
}
)
ds_10ag = ds_10ag.drop_vars('heightAboveGround')
mgrid = np.meshgrid(ds_msl.longitude, ds_msl.latitude)
data = {
'lon': (('latitude', 'longitude'), mgrid[0], ds_msl.longitude.attrs),
'lat': (('latitude', 'longitude'), mgrid[1], ds_msl.latitude.attrs),
'uwind': ds_10ag.u10.expand_dims('valid_time'),
'vwind': ds_10ag.v10.expand_dims('valid_time'),
'prmsl': ds_msl.prmsl.expand_dims('valid_time'),
'spfh': ds_2ag.sh2.expand_dims('valid_time'),
'stmp': ds_2ag.t2m.expand_dims('valid_time'),
}
new_ds = xr.Dataset(data_vars=data)
new_ds = new_ds.swap_dims(
{'longitude': 'nx_grid', 'latitude': 'ny_grid'}
).reset_coords(
['longitude', 'latitude', 'step', 'time'],
drop=True
).rename(
{'valid_time': 'time'}
)
return new_ds
def all_hafs_to_sflux(paths):
all_sflux = []
for path in paths:
all_sflux.append(hafs_to_sflux(path))
# TODO: Limit to 1000 (MAX FILE TIME)
combined_sflux = xr.combine_by_coords(
all_sflux,
data_vars='different',
)
return combined_sflux
if __name__ == '__main__':
parser = ArgumentParser()
parser.add_argument('--out', '-o', type=Path, default='.')
parser.add_argument('files', nargs='+', type=Path)
args = parser.parse_args()
ds = all_hafs_to_sflux(args.files)
ds.to_netcdf(args.out / 'sflux_air_1.0001.nc')
For now the plan is to setup SCHISM with HAFS data (separate from the NHC ensemble effort). @saeed-moghimi-noaa and I need to have a follow up short meeting about this to discuss how the data must be used (single cycle multiple forecast vs multiple cycles first forecast)
A script is developed to read the grib files from their cloud location on s3 and write sflux
files for each ensemble (combined through time
dim). The logic is to get forecast 000
of member 00
for all of the cycles prior to the perturbation, then after perturbation, get all the forecast hours for a single cycle for each ensemble member. After some preliminary testing, I need to use these generated sfluxes to setup an ensemble using coupledmodeldriver
, or if it requires too much development, for now just to use them to manual setup an ensemble!
I updated coupledmodeldriver
to be able to setup SCHISM with user-provided sflux
files. Doing this enables us to use ensemble perturbation to write out the whole ensembled. The single model setup using coupledmodeldriver
was successful (pending SCHISM run test). I'm also testing generating the full ensemble of HAFS using this update (in a separate branch right now)
On a separate note I realized that prate
, dswrf
and dlwrf
are missing from the sflux I created. I'm need to add those as well! These variables can be found in the GRIB2 file at the layer read by:
ds = xr.open_dataset(
'/path/to/00l.YYYMMDDHH.hfsa.parent.atm.fCYC.grb2',
engine='cfgrib',
filter_by_keys={'typeOfLevel': 'surface', 'stepType': 'avg'}
)
Note that dswrf
and dlwrf
are also present at the layer with {'typeOfLevel': 'surface', 'stepType': 'instant'}
. I'm still not sure which one to use for radiation.
The generation of sflux setup in both single and ensemble modes were successful (pending run tests). The updated HAFS data extraction code with the addition of precipitation and radiation was also successfully tested (pending generated flux inspection).
I also need to address memory usage issue for the HAFS extraction code. Potential solution at https://stackoverflow.com/questions/73287939/threadpoolexecutor-threads-futures-do-not-release-memory-when-completed-and-t
The current code can either use local filesystem, or pull filesystems from the S3 bucket. Since HAFS is going to move their data, we might need to address downloading data from another location as well later.
Update After further investigation, there doesn't seem to be a memory issue in the HAFS-to-Sflux code after all! The next step is to test running the generated ensembles, and then try to run a full length storm ensemble.
Thanks @SorooshMani-NOAA
Trying to run SCHISM for a single setup case, I ran into issues with the generated sflux
file. I will add more details as I learn more.
Note that using convert_calendar
when combining all datasets to create a single sflux file is necessary. This ensures that the calendar is compatible with the pyschism
requirement; e.g. it should be gregorian
, proleptic_gregorian
is not supported.
We can escalate this issue to fix pyschism
SfluxDataset
to support other calendars as well.
Even after fixing the date issue, the single run is still failing in the first time step with an error:
23: ABORT: QUICKSEARCH: Cannot find a vert. level: -249.56895039521086 6.9526567718092286E-310 6.9526567718102167E-310 -258.64307520892299 NaN
The unperturbed version of the run (single run from HAFS sflux) went through. I haven't looked at the results yet!
In order to test an ensemble run I need to wait for the HAFS storage ticket to be addressed first.
Good progress @SorooshMani-NOAA . Thanks for the update!
The HAFS dataset seems to be on NODD now, I'll need to check the files there to see if I can setup a run or not.
I'm running into issue running locally. I'll try again on a PW node
I just noticed that for Idalia, ensemble 26 doesn't have any forecast on 20230829, see:
aws s3 ls --no-sign-request s3://noaa-nws-hafs-pds/herc/26/2023082900/
there's no grib
files.
I created a test run from 08/28/2023 to 08/30/2023 with perturbation at 08/29/2023. That means that from 28th to 29th the sflux values come from the 00
grib file at fcst000
on the given cycle datetime, and for the rest of the times until the end each sflux value comes from it's ensemble member grib file 01
-30
on 08/29/2023 for different fcst???
. Noting that 26
doesn't have any data for forecast at the given perturbation.
The aforementioned ensemble is running on Hercules right now. After it's success I'll create another ensemble from 08/26 to 09/02 with perturbation at 08/28
I made the same mistake with nans https://github.com/schism-dev/schism/issues/121. I'm writing a script to take care of fixing sflux using the provided hgrid.
Another update: If I use 00th hour of the aforementioned days for start and end of the simulation both to fetch the data and to setup the simulation, the run fails due to an additional timestep of data missing from sflux. I'm not sure if I have a cut-off error somewhere, or if SCHISM requires that sflux have additional timesteps after end time.
All but one of the runs for 08/26 to 09/02, with perturbation at 08/28 went through successfully.
Note after setting up the model using ensembleperturbation
with the sflux
file addition, I manually updated iof_hydro(2)
and iof_hydro(14)
for all the runs to be 1
. Other than that I set the rnday
to be 6.75
to avoid the issue at last time step: https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/112#issuecomment-2020624792
I also noticed that ensemble 24
doesn't have past hour 78
of the forecast for cycle 20230828
:
https://noaa-nws-hafs-pds.s3.amazonaws.com/index.html#herc/24/2023082800/
I need to now validate the run and then figure out how to generate the prob fields using IDL code.
Looking at sfluxes for validation, it seems that they are generated correctly, I get storm field as I expected, and there's 0 difference until 08/28, and then there's some divergance:
Landfall of 00 member |
Difference at 00 and 05 08/28 |
Difference of 00 and 05 at 08/30T6 |
---|---|---|
Although the sflux files look OK, the results from SCHISM doesn't look right, I need to check with VIMS team to see how to debug this:
sflux velocity field |
sflux pressure field |
schism pressure | Colorbar |
---|---|---|---|
Another strange point is the range I get for pressure from schism out2d:
Fei suggested subtracting 360 from the longitudes in the generated sflux files, I'll try that and report back
Adjusting the longitude to be -180 to 180 resolved the issue!
Now I need to find out how to post process the results!
@feiye-vims
Hi Fei,
Would you please look at this issue.
Thanks, -Saeed
Hi Saeed,
I looked at the issue last week and it has been resolved.
Best, Fei
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Saeed Moghimi @.> Sent: Monday, April 15, 2024 6:45:48 PM To: noaa-ocs-modeling/EnsemblePerturbation @.> Cc: Fei Ye @.>; Mention @.> Subject: Re: [noaa-ocs-modeling/EnsemblePerturbation] Using HAFS ensemble in SCHISM ensemble workflow (Issue #112)
[EXTERNAL to VIMS received message]
@feiye-vimshttps://github.com/feiye-vims
Hi Fei,
Would you please look at this issue.
Thanks, -Saeed
— Reply to this email directly, view it on GitHubhttps://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/112#issuecomment-2057937099, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFFRLZ63WRILH2ATJCT7SVDY5RKBZAVCNFSM6AAAAAA52RTY3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXHEZTOMBZHE. You are receiving this because you were mentioned.Message ID: @.***>
This ticket tracks discussion about how to use the HAFS results for SCHISM workflow. Relevant tickets: