pysat / pysatNASA

pysat support for NASA Instruments
BSD 3-Clause "New" or "Revised" License
21 stars 7 forks source link

BUG: ICON metadata has 'Valid_Range' as an array input #99

Closed jklenzing closed 2 years ago

jklenzing commented 2 years ago

Describe the bug Some of the ICON data products have 'Valid_Range' in addition to 'Valid_Min' and 'Valid_Max'. This is stored as an array, which is not allowed in pysat meta. Affects MIGHTI LOS winds and vector winds.

To Reproduce

import pysat
import warnings
warnings.simplefilter('always', UserWarning)
los = pysat.Instrument('icon', 'mighti', tag = 'los_wind_green', inst_id='a')
los.load(2020, 1)

Expected behavior Data should load without warnings.

Screenshots n/a

Desktop (please complete the following information):

Additional context Uses load_netcdf routine from pysat. Can we tell pysat which variables to drop? These are not used since min and max already exist in the metadata.

jklenzing commented 2 years ago

Pinging @rstoneback.

rstoneback commented 2 years ago

Valid_Range is an optional part of the ICON standard. It may be used in the file as well as the Valid_Min and Valid_Max. I don't think we currently have support for not loading particular metadata information.

jklenzing commented 2 years ago

Currently, because this is an array the info is dropped and warnings are raised. Options:

aburrell commented 2 years ago

I think it should be possible at the moment to specify array meta data types? This would likely tie-in to pysat issue https://github.com/pysat/pysat/issues/940

jklenzing commented 2 years ago

Currently we drop these and warn the user (unless it's an array of strings)

https://github.com/pysat/pysat/blob/e2930e881ebd41da0d3bc19885e2db7687a2fe40/pysat/_meta.py#L404-L407

aburrell commented 2 years ago

I think it should be ok (as in, we should change it to be allowed), so long as this is the type expected by MetaLabels.

aburrell commented 2 years ago

Ok, I tried to make this happen, but the pandas DataFrame doesn't like it. You can test in out in the pysat branch meta_labels_type. I am not sure how we ended up with OMNI array meta data, but not being able to add array meta data?

You can see it here: https://github.com/pysat/pysat/pull/961

jklenzing commented 2 years ago

OK, this makes sense. We would probably have to move away from pandas to manage the metadata to make that work.

What if we just converted arrays to strings? That would keep us from losing information.

aburrell commented 2 years ago

You could also establish a string format for array data that would allow it to be parsed and written as an array with an additional routine.

rstoneback commented 2 years ago

We can easily run into trouble trying to mangle array metadata into something consistent.

Pandas doesn't like to take array data within a single 'cell'. It complains if it knows that's what you are doing. You can do things it doesn't like at assignment (like a series of series) when assigning a complete list of items that you can't do one at a time. That's my guess for the OMNI data.

rstoneback commented 2 years ago

After more thought, since the ICON file standard itself supports either an array for range or individual min/max, we should have built in automatic support for automatically dealing with those parts of the standard. If only range is present, break content into min and max. If both present, drop range without a warning. Even better if developers can leverage that functionality as well.

We could do something akin to the _meta_translation_table, or update that table for more functionality.

rstoneback commented 2 years ago

When we write a file, there are three places where metadata has a chance to be modified.

When we load a file we can

We should probably have the same level of support for dealing with metadata as it is loaded. a _import_meta_processing that receives a dict with all metadata from the file, perhaps after an inverse pass through _meta_translation_table, should provide developers with the freedom needed to tweak whatever is being loaded.

aburrell commented 2 years ago

This is the bug I encountered in the old pysat Constellation custom function example. Would be good to get this fixed, since ICON data is SUPER important right now.

rstoneback commented 2 years ago

I've been working with it and the console is bombarded with

/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'numpy.float64'>. Dropping input for 'ICON_L27_UTC_Time' with key 'Valid_Min'
  warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'float'>. Dropping input for 'ICON_L27_UTC_Time' with key 'ValidMax'
  warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'float'>. Dropping input for 'ICON_L27_UTC_Time' with key 'ValidMin'
  warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'numpy.float64'>. Dropping input for 'ICON_L27_UTC_Time' with key 'Valid_Max'
  warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'numpy.float64'>. Dropping input for 'ICON_L27_UTC_Time' with key 'Valid_Min'
  warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:430: UserWarning: Metadata with type <class 'str'>does not match expected type <class 'float'>. Dropping input for 'ICON_L27_UTC_Time' with key 'ValidMax'

I've been thinking we need some kind of function to alter metadata to calm down the warnings. Then with your recent comment I see I had similar thoughts some time back. I could make this next on the list.

rstoneback commented 2 years ago

I have a pull request up that silences warnings from ICON instruments. Many of them were driven by the fill value being a string but we have a single type check for all fill values. There are some mechanisms to make the warnings go away now, or to have an area to process the metadata as needed, but the underlying pandas storage has some limitations. Fully sorting out meta will have to deal with that.

rstoneback commented 2 years ago

Ok, I tried to make this happen, but the pandas DataFrame doesn't like it. You can test in out in the pysat branch meta_labels_type. I am not sure how we ended up with OMNI array meta data, but not being able to add array meta data?

You can see it here: pysat/pysat#961

I wonder if we put the columns in 'object' mode and stored the type as its own column we may be able to get pandas to accept the data and we could use the stored type as needed for defaults and such.

jklenzing commented 2 years ago

Validated this was fixed by #100.