sunpy / drms

Access HMI, AIA and MDI data with Python from public JSOC DRMS servers
https://docs.sunpy.org/projects/drms/en/stable/
BSD 2-Clause "Simplified" License
22 stars 23 forks source link

`ValueError` in queries with `MISSING` values #98

Closed PaulJWright closed 1 year ago

PaulJWright commented 1 year ago

Describe the bug

Certain drms queries return a ValueError, e.g.:

keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@2h]',
               key=drms.const.all)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[162], line 1
----> 1 keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@2h]',
      2                key=drms.const.all)
      4 print(keys[['DATE__OBS','QUALITY']])

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/drms/client.py:1072, in Client.query(self, ds, key, seg, link, convert_numeric, skip_conversion, pkeys, rec_index, n)
   1070         res_key = pd.DataFrame()
   1071     if convert_numeric:
-> 1072         self._convert_numeric_keywords(ds, res_key, skip_conversion)
   1073     res.append(res_key)
   1075 if seg is not None:

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/drms/client.py:654, in Client._convert_numeric_keywords(self, ds, kdf, skip_conversion)
    652     if idx.any():
    653         k_idx = kdf.columns.get_loc(k)
--> 654         kdf[kdf.columns[k_idx]] = kdf[kdf.columns[k_idx]].apply(int, base=16)
    655 if k in num_keys:
    656     kdf[k] = _pd_to_numeric_coerce(kdf[k])

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/pandas/core/series.py:4626, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4516 def apply(
   4517     self,
   4518     func: AggFuncType,
   (...)
   4521     **kwargs,
   4522 ) -> DataFrame | Series:
   4523     """
   4524     Invoke function on values of Series.
   4525 
   (...)
   4624     dtype: float64
   4625     """
-> 4626     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/pandas/core/apply.py:1025, in SeriesApply.apply(self)
   1022     return self.apply_str()
   1024 # self.f is Callable
-> 1025 return self.apply_standard()

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/pandas/core/apply.py:1076, in SeriesApply.apply_standard(self)
   1074     else:
   1075         values = obj.astype(object)._values
-> 1076         mapped = lib.map_infer(
   1077             values,
   1078             f,
   1079             convert=self.convert_dtype,
   1080         )
   1082 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1083     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1084     #  See also GH#25959 regarding EA support
   1085     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/pandas/_libs/lib.pyx:2834, in pandas._libs.lib.map_infer()

File ~/.pyenv/versions/arccnet/lib/python3.9/site-packages/pandas/core/apply.py:133, in Apply.__init__.<locals>.f(x)
    132 def f(x):
--> 133     return func(x, *args, **kwargs)

ValueError: invalid literal for int() with base 16: 'MISSING'

To Reproduce

In the most simple case, requesting 6 hours of data from 2011.04.14_00:30:00, at a 2 hour cadence, for all keys (drms.const.all), will raise the above error.

client = drms.Client(debug=True, verbose=True, email=<email_address>)

keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@2h]',
               key=drms.const.all)

As would perhaps be expected, the following queries complete successfully:

keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@1h][? (QUALITY!=0) ?]',
               key=drms.const.all)

print(keys[['DATE__OBS','QUALITY']])

  DATE__OBS     QUALITY
0   MISSING  3221356544
1   MISSING  3221356544
2   MISSING  3221356544
keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@1h][? (QUALITY=0) ?]',
               key=drms.const.all)

print(keys[['DATE__OBS','QUALITY']])

                 DATE__OBS  QUALITY
0  2011-04-14T03:34:20.00Z        0
1  2011-04-14T04:34:20.00Z        0
2  2011-04-14T05:34:20.00Z        0

and the following raises the same ValueError

keys = client.query('hmi.M_720s[2011.04.14_00:30:00/6h@1h][? (QUALITY<65536) ?]',
               key=drms.const.all)

print(keys[['DATE__OBS','QUALITY']])

Screenshots

The JSOC query of hmi.M_720s[2011.04.14_00:30:00/6h@2h] points to the issue: image

System Details

==============================
sunpy Installation Information
==============================

General
#######
OS: Mac OS 13.3.1
Arch: 64bit, (arm)
sunpy: 4.1.6
Installation path: /Users/pjwright/.pyenv/versions/arccnet/lib/python3.9/site-packages/sunpy-4.1.6.dist-info

Required Dependencies
#####################
astropy: 5.2.2
numpy: 1.24.3
packaging: 23.1
parfive: 2.0.2

Optional Dependencies
#####################
asdf: 2.15.0
asdf-astropy: 0.4.0
beautifulsoup4: 4.12.2
cdflib: 0.4.9
dask: 2023.5.0
drms: 0.6.3
glymur: 0.12.5
h5netcdf: 1.1.0
h5py: 3.8.0
lxml: 4.9.2
matplotlib: 3.7.1
mpl-animators: 1.1.0
pandas: 2.0.1
python-dateutil: 2.8.2
reproject: 0.10.0
scikit-image: 0.20.0
scipy: 1.9.1
sqlalchemy: 2.0.13
tqdm: 4.65.0
zeep: 4.2.1

Installation method

pip

PaulJWright commented 1 year ago

@alasdairwilson notes this may be come about between versions 0.6.2 and 0.6.3

PaulJWright commented 1 year ago

Might look into this over the weekend... Here is the line in client.py: https://github.com/sunpy/drms/blame/8b2b8666e336bb28f3260ea6fe6f38722f96dd5b/drms/client.py#L654