noaa-oar-arl / monet

The Model and ObservatioN Evaluation Toolkit (MONET)
https://monet-arl.readthedocs.io
Other
44 stars 20 forks source link

monet/monet_accessor.py : monet.to_ascii2nc_list : var column restricted to integers when should be strings #151

Open Wesley-J-Davis opened 1 month ago

Wesley-J-Davis commented 1 month ago

monet/monet_accessor.py 277 df["ascii2nc_gribcode"] = int(grib_code)

According to the MET docs the var column should come through as a string so as to allow for GRIB codes and string variables. The datatype in the column has the default data type as string.

(see Table 37.2): https://met.readthedocs.io/en/latest/Users_Guide/appendixF.html#python-embedding-for-point-observations

As it sits now, in order to use this code for python embedding I'm restricted to using an integer to define variables, which isn't always going to be the case.

Here's an example of how I'm using monet and a line of output to illustrate:

dates = pd.date_range(start='2017-09-25',end='2017-09-26',freq='h')
df = pd.DataFrame
df = monetio.aeronet.add_data(dates=dates, product='AOD15')
#print (df.columns.tolist())
print (df["sensor_temperature(degrees_c)"])
point_data = df.monet.to_ascii2nc_list(
    grib_code=2, 
    height_msl=None,
    column="sensor_temperature(degrees_c)",
    message_type="AERONET AOD15", 
    pressure=1000.0, # level
    qc=None,
    height_agl=None)
print (point_data)

message type : station ID : valid time : lat : lon : msl : var : level : agl : qc : val ['AERONET AOD15', 'Huambo', '20170925_103207', -12.8679, 15.7046, None, '2', 1000.0, None, '0', 34.4]

From monet/monet_accessor.py

def to_ascii2nc_df(
        self,
        grib_code=126,
        height_msl=0.0,
        column="aod_550nm",
        message_type="ADPUPA",
        pressure=1000.0,
        qc=None,
        height_agl=None,
    ):
        df = self._obj
        df["ascii2nc_time"] = df.time.dt.strftime("%Y%m%d_%H%M%S")
        df["ascii2nc_gribcode"] = int(grib_code)

grib_code is our variable and it won't always be an integer. For instance, I pulled in AERONET data using monetio and when converting that data to the list required by MET I'd like to pick from the column names that get output when I first create the dataframe using:

df = monetio.aeronet.add_data(dates=dates, product='AOD15') print (df.columns.tolist())

Since these variables aren't in grib codes I can't exactly translate what they are to the MET output. The docs dictate that the var column should contain strings so the following changes ought to remedy it.

267        ascii2nc_var=126,
277        df["ascii2nc_var"] = str(ascii2nc_var)
307                "ascii2nc_var",
322                ascii2nc_var="var",
bbakernoaa commented 1 month ago

I think we need a little more information here. The gribcode should be able to mapped to a specific variable we should look at the grib tables to see what the variables are. MONET reads directly from the AERONET server and so we aren't reading the grib data itself. We could have an internal mapping to add the grib table there but it isn't available from the AERONET website.

Wesley-J-Davis commented 1 month ago
def to_ascii2nc_df(
        self,
        grib_code=126,
        height_msl=0.0,
        column="aod_550nm",
        message_type="ADPUPA",
        pressure=1000.0,
        qc=None,
        height_agl=None,
    ):

Of all these arguments, grib code and message_type are the only ones that don't reach back to the AERONET dataframe.

The user picks this grib_code out of thin air. Any integer will do and will get passed through to the output under the var column.

But it demands an integer and that's the bug.

It should operate like the message_type argument and accept whatever is passed in.

This demanded integer 277 df["ascii2nc_gribcode"] = int(grib_code) is later converted to a string. 330 out = out.astype(dict(typ=str, sid=str, vld=str, var=str, qc=str)) Since var is defined as 322 ascii2nc_gribcode="var",

It would be better if the column string that gets passed in was also used to populate the var column of the out dataframe. Right now, that string is used to reach back to the AERONET dataframe that monetio made. It grabs the data from df[column] and that data fills out the val column.

If that column string were used to populate the var column, each value would retain its relationship to the column it came from.

At the very least, line 277 should be changed to remove the int requirement.

That would allow me to define my variables manually without being restricted to using integers.

bbakernoaa commented 1 month ago

These two do not reach back to the aeronet data frame because they are not defined there. As you mentioned they are defined by the user. The grib_code was set to be an integer because all grib codes are integers. If for instance you are saying that you would perhaps want to pass the full table -> variable number like 4.233.0 for ozone (https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/grib2_table4-233.shtml) then I understand what you are meaning. We could certainly modify this to be unmodified input and output the result as a string.