rweigel / cdawmeta

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

DEPEND_0s that vary with time, but not within a file: FA_ESA_L2_EEB #4

Open rweigel opened 10 months ago

rweigel commented 10 months ago

Parameter data has DEPEND_1 and DEPEND_2 of pitch_angle_median and energy_median, respectively, both of which have RecVariance of VARY.

They have DEPEND_0s of compno_64 and compno_96, each with RecVariance of NOVARY.

I don't understand why the DEPEND_1 and DEPEND_2 have RecVariances of VARY.

From https://spdf.gsfc.nasa.gov/pub/software/cdf/doc/cdf360/cdf360crm.pdf

4.9 Record/Dimension Variances

Record and dimension variances affect how variable data values are physically stored.

VARY True record or dimension variance.

NOVARY False record or dimension variance.

If a variable has a record variance of VARY, then each record for that variable is physically stored. If the record variance is NOVARY, then only one record is physically stored. (All of the other records are virtual and contain the same values.)

If a variable has a dimension variance of VARY, then each value/subarray along that dimension is physically stored. If the dimension variance is NOVARY, then only one value/subarray along that dimension is physically stored. (All other values/subarrays along that dimension are virtual and contain the same values.)

jbfaden commented 10 months ago

My recollection is that Nand didn't support time-varying DEPEND_1 and DEPEND_2. (In fact I think it's a 2.0 server, so we hadn't introduced that yet.) I will verify this.

rweigel commented 10 months ago

Your recollection is correct.

This is a case where it looks like he is supporting something that is time varying because of the RecVariance of VARY. But in fact, the bins are not time varying.

jbfaden commented 10 months ago

Autoplot is also having trouble with this one, probably for the same reasons.

jbfaden commented 10 months ago

What I see is that the master file is marked to be compressed, having a magic number of magic == CDF3_COMPRESSED_MAGIC, but then it does not decompress with gzip compression. That explains why Autoplot is failing, and I'll have a look at the HAPI server code directly. This also uses Nand's CDF library, but possibly a different version which doesn't show this bug.

berniegsfc commented 10 months ago

According to the master cdf, depend_1/2 are record/time varying. But the data cdf (or at least the one I randomly picked) indicates that they are not. So I guess this is one of those dataset where the depend variable doesn't vary within a single cdf but does vary from cdf to cdf. Since nand's server is version 2, it could not support varying bins. I don't know if he intensionally ignored the master or not. Maybe he thought treating it as non-varying so data for that variable was returned was better than not serving the variable or dataset.

rweigel commented 10 months ago

My recollection is that Nand drops variables with time-varying bins, but I need to print a list of variables to double check. So this is a case where I figured he'd just drop the variable.

For DEPEND_1, which ispitch_angle_median, Nand reports time variation in a single file:

1996-09-16T22:43:56.521Z,1.284297e+02,1.171797e+02,1.059297e+02,9.467969e+01,8.342969e+01,7.217969e+01,6.092969e+01,4.967969e+01,3.842969e+01,2.717969e+01,1.592969e+01,4.679688e+00,3.534297e+02,3.421797e+02,3.309297e+02,3.196797e+02,3.084297e+02,2.971797e+02,2.859297e+02,2.746797e+02,2.634297e+02,2.521797e+02,2.409297e+02,2.296797e+02,2.184297e+02,2.071797e+02,1.959297e+02,1.846797e+02,1.734297e+02,1.621797e+02,1.509297e+02,1.396797e+02,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31
1996-09-16T22:43:56.690Z,1.396797e+02,1.284297e+02,1.171797e+02,1.059297e+02,9.467969e+01,8.342969e+01,7.217969e+01,6.092969e+01,4.967969e+01,3.842969e+01,2.717969e+01,1.592969e+01,4.679688e+00,3.534297e+02,3.421797e+02,3.309297e+02,3.196797e+02,3.084297e+02,2.971797e+02,2.859297e+02,2.746797e+02,2.634297e+02,2.521797e+02,2.409297e+02,2.296797e+02,2.184297e+02,2.071797e+02,1.959297e+02,1.846797e+02,1.734297e+02,1.621797e+02,1.509297e+02,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31,-1.0E31

which appear to be from the same file based on

    {
      "Name": "https://cdaweb.gsfc.nasa.gov/sp_phys/data/fast/esa/l2/eeb/1996/09/fa_esa_l2_eeb_19960916224356_00287_v02.cdf",
      "MimeType": "application/x-cdf",
      "StartTime": "1996-09-16T22:43:56.000Z",
      "EndTime": "1996-09-16T22:44:41.000Z",
      "Length": 149856,
      "LastModified": "2021-10-18T04:19:46.382Z"
    },

Either way, I think I understand the motivation for the VARY/NOVARY now. What is not understood was exactly why Nand kept this variable when in other cases variables with time varying DEPENDs are dropped. But there are a long list of questions of this nature that we've decided to not pursue.

rweigel commented 3 weeks ago

@jbfaden - The new server will need to handle this. Here is what Autoplot gives:

Screenshot 2024-08-01 at 12 04 58 PM
jbfaden commented 3 weeks ago

That some CDFs have this problem rings a bell. I know I needed a number of kludges in Autoplot to get all (many) of the CDFs to work. I'll have a look at this.

rweigel commented 3 weeks ago

Ideally these issues would be documented somewhere so we don't need to rediscover. Perhaps in this issue tracker?

jbfaden commented 3 weeks ago

We could tag the tickets with a project like "cdaweb-new" or whatever the server ID is.

jbfaden commented 2 weeks ago

I verified that Autoplot can read this with a warning, and that the data definitely appears mangled.

Looking at the 80th record shows that Autoplot might be figuring out how to flip the dimensions (https://cdaweb.gsfc.nasa.gov/sp_phys/data/fast/esa/l2/eeb/1997/01/fa_esa_l2_eeb_19970101071903_01437_v02.cdf?energy_full[80]):

image

berniegsfc commented 2 days ago

What version of skteditor where you getting the depend dimension problem with? I just downloaded https://cdaweb.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/fa_esa_l2_eeb_00000000_v01.cdf and opened it with skteditor 1.3.7.1 (cdf library 3.9.1_0) and there is no complaint about data's depend 1 and 2 dimensions. I see the following:

###############################
Compliance Check for /home/btharris/fa_esa_l2_eeb_00000000_v01.cdf
CDF File Version: 3.9.0
File Last Leap Second: 2015-07-01
Majority: Row
/home/btharris/fa_esa_l2_eeb_00000000_v01.cdf is not ISTP-Compliant.
Global errors:
    Warning: CDF is set for row major array variables and column major is recommended.
    Logical_file_id should be 'fast_l2_eeb_fa_esa__v01'.  It is ' '.
    Logical_file_id has no entries.
The following variables are not ISTP-compliant:
    header_bytes
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    data_quality
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    nbins
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    nenergy
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    mode_ind
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    data
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=data' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=data'.
        DISPLAY_TYPE error: invalid keyword '166'
        VALIDMIN data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMIN data type has been changed to 'CDF_UINT1'.
        VALIDMAX data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        VALIDMAX data type has been changed to 'CDF_UINT1'.
        FILLVAL data type 'CDF_INT2' did not match variable data type 'CDF_UINT1'.
        FILLVAL data type has been changed to 'CDF_UINT1'.
    eflux
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=eflux' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=eflux'.
        DISPLAY_TYPE error: invalid keyword '166'
    eflux_movie
        DISPLAY_TYPE attribute value 'plasma_movie>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=eflux_movie' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasma_movie>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=eflux_movie'.
        DISPLAY_TYPE error: invalid keyword 'thumbsize'
    eflux_byE_atA
        DISPLAY_TYPE attribute value 'spectrogram>y=energy_median, z=eflux_byE_atA(2,*), z=eflux_byE_atA(8,*), z=eflux_byE_atA(14,*), z=eflux_byE_atA(20,*), z=eflux_byE_atA(26,*), z=eflux_byE_atA(32,*), z=eflux_byE_atA(50,*)' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'spectrogram>y=energy_median, z=eflux_bye_ata(2,*), z=eflux_bye_ata(8,*), z=eflux_bye_ata(14,*), z=eflux_bye_ata(20,*), z=eflux_bye_ata(26,*), z=eflux_bye_ata(32,*), z=eflux_bye_ata(50,*)'.
    eflux_byA_atE
        DISPLAY_TYPE attribute value 'spectrogram>y=compno_64, z=eflux_byA_atE(*,2), z=eflux_byA_atE(*,8), z=eflux_byA_atE(*,14), z=eflux_byA_atE(*,20), z=eflux_byA_atE(*,26), z=eflux_byA_atE(*,32), z=eflux_byA_atE(*,38), z=eflux_byA_atE(*,44), z=eflux_byA_atE(*,76)' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'spectrogram>y=compno_64, z=eflux_bya_ate(*,2), z=eflux_bya_ate(*,8), z=eflux_bya_ate(*,14), z=eflux_bya_ate(*,20), z=eflux_bya_ate(*,26), z=eflux_bya_ate(*,32), z=eflux_bya_ate(*,38), z=eflux_bya_ate(*,44), z=eflux_bya_ate(*,76)'.
    pitch_angle_median
        unrecognized virtual variable FUNCTION 'arr_slice'
    energy_median
        unrecognized virtual variable FUNCTION 'arr_slice'
    energy_full
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=energy_full' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=energy_full'.
        DISPLAY_TYPE error: invalid keyword '166'
    denergy_full
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=denergy_full' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=denergy_full'.
        DISPLAY_TYPE error: invalid keyword '166'
    pitch_angle
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=pitch_angle' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=pitch_angle'.
        DISPLAY_TYPE error: invalid keyword '166'
    domega
        DISPLAY_TYPE attribute value 'plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=domega' is not all lower case.
        DISPLAY_TYPE attribute value changed to 'plasmagram>thumbsize>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median, z=domega'.
        DISPLAY_TYPE error: invalid keyword '166'
    compno_96
        UNITS (and UNIT_PTR) attribute is missing.
        Missing DISPLAY_TYPE attribute.  (No such entry for specified attribute.)
        Created DISPLAY_TYPE attribute and set its initial value to 'time_series'
        LABL_PTR_1 attribute is missing.  (No such entry for specified attribute.)
        FILLVAL value of '0' is non-standard.
            The recommended value is '-32768'.
    compno_64
        UNITS (and UNIT_PTR) attribute is missing.
        Missing DISPLAY_TYPE attribute.  (No such entry for specified attribute.)
        Created DISPLAY_TYPE attribute and set its initial value to 'time_series'
        LABL_PTR_1 attribute is missing.  (No such entry for specified attribute.)
        FILLVAL value of '0' is non-standard.
            The recommended value is '-32768'.
    Epoch
        ISTP epoch variable 'epoch' is missing the TIME_BASE attribute.
        The TIME_BASE attribute has been added.

what am I doing wrong?

rweigel commented 1 day ago

I just tried the file you linked to with the same SKTEditor 1.3.5 I used previously and got the same result as you. I don't know what is going on. I've attached a screenshot of what I saw previously, which says the master CDF file version is 3.8.8. I don't know where I got the master CDF from, I would think directly from the same URL. (Note that at https://cdaweb.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/, it says fa_esa_l2_eeb_00000000_v01.cdf was last modified on August 2nd, 2024.) It is possible that the screenshot on the left is from a master, and the one on the right is from a non-master CDF (which I just verified shows the error).

Screenshot 2024-08-20 at 12 48 45 PM
berniegsfc commented 1 day ago

I can reproduce the skteditor messages with the last "data" cdf. However, according to

$ cdfdump -dump metadata -vars data,energy_channel,pitch_angle_bin fa_esa_l2_eeb_20090430074129_51314_v02.cdf 
Dumping cdf from "fa_esa_l2_eeb_20090430074129_51314_v02.cdf"...
File Info
=========================================
CDF File:     fa_esa_l2_eeb_20090430074129_51314_v02.cdf
Version:      3.6.2

Common Data Format (CDF)
(C) Copyright 1990-2016 NASA/GSFC
Space Physics Data Facility
NASA/Goddard Space Flight Center
Greenbelt, Maryland 20771 USA
(Internet -- GSFC-CDF-SUPPORT@LISTS.NASA.GOV)

Format:       SINGLE
Encoding:     IBMPC
Majority:     ROW
NumrVars:     0
NumzVars:     61
NumAttrs:     54 (26 global, 28 variable)
Compression:  None
Checksum:     None
LeapSecondLastUpdated:     20170101

Global Attributes (26 attributes)
=========================================
...deleted...

Variable Attributes (28 attributes)
=========================================
...deleted...

Variable Information 
===========================================================
data                  CDF_UINT1/1   2:[64,96]   T/TT
energy_channel        CDF_INT4/1    1:[96]  F/T
pitch_angle_bin       CDF_INT4/1    1:[64]  F/T

data (No: 20) (Recs: 2154) (Compression: GZIP.6 BlockingFactor: 166)
----
Attribute Entries:
     CATDESC         (CDF_CHAR/66): "Burst Electron Raw Counts data with dimensions (96, 64, NUM_DISTS)"
     DISPLAY_TYPE    (CDF_CHAR/81): "plasmagram>THUMBSIZE>166>xsz=4,ysz=7>xx=pitch_angle_median,y=energy_median,z=data"
     FIELDNAM        (CDF_CHAR/25): "Burst Electron Raw Counts"
     UNITS           (CDF_CHAR/6): "Counts"
     DEPEND_TIME     (CDF_CHAR/9): "time_unix"
     DEPEND_0        (CDF_CHAR/5): "epoch"
     VAR_TYPE        (CDF_CHAR/4): "data"
     COORDINATE_SYSTEM (CDF_CHAR/6): "sensor"
     SCALETYP        (CDF_CHAR/3): "log"
     LABLAXIS        (CDF_CHAR/25): "Burst Electron Raw Counts"
     MONOTON         (CDF_CHAR/5): "FALSE"
     VAR_NOTES       (CDF_CHAR/4): "None"
     FILLVAL         (CDF_INT2/1): 255
     FORMAT          (CDF_CHAR/2): "I4"
     VALIDMIN        (CDF_INT2/1): 0
     VALIDMAX        (CDF_INT2/1): 255
     DEPEND_1        (CDF_CHAR/14): "energy_channel"
     DEPEND_2        (CDF_CHAR/15): "pitch_angle_bin"

it seems like skteditor is correct. The data cdf depends are wrong and the master cdf has the corrected depends and that's why you should use the master's depends. Maybe Tami knows the history of the master.

berniegsfc commented 1 day ago

Tami sent the change log for the master and it includes

revision 1.7
date: 2016/03/16 20:45:08;  author: ryurow;  state: Exp;  lines: +39150 -70496
Update for new version of data set.
----------------------------
revision 1.6
date: 2016/01/26 20:30:10;  author: ryurow;  state: Exp;  lines: +70438 -39167
reverted to old version 1.4
----------------------------
revision 1.5
date: 2016/01/26 03:36:17;  author: ryurow;  state: Exp;  lines: +39176 -70447
Updated "depend_1" and "depend_2" vAttributes of virtual variables so that can be plotted.
----------------------------
revision 1.4
date: 2016/01/21 20:37:33;  author: mcguire;  state: Exp;  lines: +2 -2
minor edit

plus a lot more. Revision 1.5 would seem to be when the depends were reversed but then that change was immediately reversed in 1.6. We'd probably have to see the diff's to be sure. But the history doesn't seem too important now. Just use the current master.

jbfaden commented 1 day ago

I'm pretty sure when reading from CDAWeb, Autoplot would use the DEPEND_1 and DEPEND_2 from each file, even though a skeleton is available. This is a bug, and I will see if I can easily resolve this.