Include preprocessing steps into the dataset creation?

### Reorganize the temporal dimension/coordinate #### Add the *time* dimension Originally the time information is coded in the variables **ymd** and **tod**. The **sample** index represents the time step count. **ymd** includes date information: the first digit indicates the index of year, the next two digits indicate the month and the last three digits indicates the calendar day in the year. **tod** represents time in the day counted in seconds.

I also came up with a more efficient way of creating the time dimension (no need for the for-loop):

def ymd_tod_to_date(ymd:int, tod:int) -> dict:
    year=ymd//10000
    month=ymd%10000//100
    day=ymd%10000%100
    hour=tod//3600
    minute=tod%3600//60
    return dict(year=year, month=month, day=day, hour=hour, minute=minute)

start_date_dict = ymd_tod_to_date(ds['ymd'][0].data, ds['tod'][0].data)
start_date = cftime.DatetimeNoLeap(start_date_dict['year'], start_date_dict['month'], start_date_dict['day'], start_date_dict['hour'], start_date_dict['minute'])
time = xr.cftime_range(start=start_date, freq='1200S', periods=len(ds.ymd))
ds = ds.assign(sample=time).rename({'sample':'time'}).drop(['tod', 'ymd'])
# Check the current **time** dimension, read the timestep
ds.time.values[0:5]

instead of

# loop over all sample points
year=ds['ymd']//10000
month=ds['ymd']%10000//100
day=ds['ymd']%10000%100
hour=ds['tod']//3600
minute=ds['tod']%3600//60

k=0
t = []
for k in range(len(ds['ymd'])):
    t.append(cftime.DatetimeNoLeap(year[k],month[k],day[k],hour[k],minute[k]))
    break

# add the time array to the 'sample' dimension; then, rename
ds['sample'] = t
ds = ds.rename({'sample':'time'})

# now 'time' dimension replaced 'sample' dimension.
ds = ds.drop(['tod','ymd'])

# Check the current **time** dimension, read the timestep
ds.time.values[0:5]

Please let me know if I should make a PR for this.

sungdukyu / LEAP_REU_Dataset_Notebook

Include preprocessing steps into the dataset creation? #12