spedas / pyspedas

Python-based Space Physics Environment Data Analysis Software
https://pyspedas.readthedocs.io/
MIT License
155 stars 60 forks source link

cdflib returning extra dimension on string-valued CDF variables? #943

Open jameswilburlewis opened 3 months ago

jameswilburlewis commented 3 months ago

After adding more attribute structure and ISTP compliance checking, I'm seeing a lot of cases like this:

21-Jul-24 12:30:07: Variable mms1_aspoc_status time-varying DEPEND_1 attribute mms1_aspoc_lbl has 1 times, but data has 86400 times. Ignoring.

The attribute values are usually something like [['x', 'y', 'z']]. But spot checking a few of them with skteditor, they appear to be defined as 1-D arrays, like ['x', 'y', 'z'] which is more in line with what we'd expect to see as label metadata for a 3-element vector-valued data variable. I suspect cdflib is introducing an extra dimension, only for string-valued variables. And if the variables happen to be used as DEPEND_N values, it makes them look like they're supposed to be time-varying. (String-valued DEPEND_N variables are actually not ISTP compliant, but it's so common on 1-D variables that we just ignore it, to avoid getting spammed with non-compliance warnings.)

I need to make some test CDFs and see what IDL and cdflib do with them, to confirm that this is a cdflib issue -- if so, I'll file an issue over on their Github.

jameswilburlewis commented 3 months ago

Sometimes the extra dimension seems to be trailing, in this case 4x1:

21-Jul-24 22:41:48: Download complete: cluster_data/c1/pp/asp/2004/c1_pp_asp_20040405_v01.cdf
21-Jul-24 22:41:49: Variable Status__C1_PP_ASP time-varying DEPEND_1 attribute L_Status has 4 times, but data has 21382 times. Ignoring.

And sometimes leading, in this case 1x3:

21-Jul-24 22:48:29: Variable Data_Pressure_GSE__C1_CP_PEA_MOMENTS time-varying DEPEND_1 attribute Data_Pressure_GSE__C1_CP_PEA_MOMENTS_REPRESENTATION_1 has 1 times, but data has 62 times. Ignoring.
21-Jul-24 22:48:29: Variable Data_Pressure_GSE__C1_CP_PEA_MOMENTS time-varying DEPEND_2 attribute Data_Pressure_GSE__C1_CP_PEA_MOMENTS_REPRESENTATION_2 has 1 times, but data has $62 times. Ignoring.
jameswilburlewis commented 3 months ago

There is already an open issue on the cdflib issue tracker, but no signs of recent activity:

https://github.com/MAVENSDC/cdflib/issues/172

jameswilburlewis commented 3 months ago

We're working around the issue for now, by looking up the original array dimensions and reshaping to that.