hapi-server / tools-python

Additional tools to support hapiclient, including merge, etc
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Bug - cannot convert float NaN to integer in merge_hapi #4

Open tinsmcl1 opened 1 month ago

tinsmcl1 commented 1 month ago

merge_hapi() throws an error when the resulting dataframe contains a column of ints and NaNs because Python does not allow converting NaN to int.

Reproduce error:

from hapiclient import hapi
from hapiplot import hapiplot
import hapitools

opts = {'logging': False, 'usecache': True, 'cachedir': './hapicache' }
start = '2013-01-01T00:00:54Z'
stop = '2013-01-01T06:00:54.000Z'
serverA, datasetA, parametersA = 'https://cdaweb.gsfc.nasa.gov/hapi', 'OMNI2_H0_MRG1HR', 'DST1800'
serverB, datasetB, parametersB = "https://imag-data.bgs.ac.uk/GIN_V1/hapi", "cki/best-avail/PT1M/hdzf", "Field_Vector"

dataA, metaA = hapi(serverA, datasetA, parametersA, start, stop, **opts)
dataB, metaB = hapi(serverB, datasetB, parametersB, start, stop, **opts)
dataB = dataB[80:100]
newAB, metaAB = hapitools.merge_hapi(dataA, metaA, dataB, metaB, round_to_sec = True, fill_nan=False)
hapiplot(newAB, metaAB)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], [line 14](vscode-notebook-cell:?execution_count=2&line=14)
     [12](vscode-notebook-cell:?execution_count=2&line=12) dataB, metaB = hapi(serverB, datasetB, parametersB, start, stop, **opts)
     [13](vscode-notebook-cell:?execution_count=2&line=13) dataB = dataB[80:100]
---> [14](vscode-notebook-cell:?execution_count=2&line=14) newAB, metaAB = hapitools.merge_hapi(dataA, metaA, dataB, metaB, round_to_sec = True, fill_nan=False)
     [15](vscode-notebook-cell:?execution_count=2&line=15) hapiplot(newAB, metaAB)

File ~/GitHub/hapi-server/tools-python/src/hapitools.py:123, in merge_hapi(data1, meta1, data2, meta2, how, round_to_sec, fill_nan)
    [121](https://file+.vscode-resource.vscode-cdn.net/Users/tinsmcl1/GitHub/hapi-server/tools-python/~/GitHub/hapi-server/tools-python/src/hapitools.py:121) dt = merge_dtypes(data1, data2, trim='Time')
    [122](https://file+.vscode-resource.vscode-cdn.net/Users/tinsmcl1/GitHub/hapi-server/tools-python/~/GitHub/hapi-server/tools-python/src/hapitools.py:122) new_data = new_df.to_records(index=False, column_dtypes={"Time": "S30"})
--> [123](https://file+.vscode-resource.vscode-cdn.net/Users/tinsmcl1/GitHub/hapi-server/tools-python/~/GitHub/hapi-server/tools-python/src/hapitools.py:123) new_data = np.array([tuple([nparray_unpack_to_list(e) for e in elm]) for elm in new_data], dtype=dt)
    [124](https://file+.vscode-resource.vscode-cdn.net/Users/tinsmcl1/GitHub/hapi-server/tools-python/~/GitHub/hapi-server/tools-python/src/hapitools.py:124) new_data = np.array([tuple(i) for i in new_data], dtype=dt)
    [126](https://file+.vscode-resource.vscode-cdn.net/Users/tinsmcl1/GitHub/hapi-server/tools-python/~/GitHub/hapi-server/tools-python/src/hapitools.py:126) return new_data, new_meta

ValueError: cannot convert float NaN to integer
tinsmcl1 commented 1 month ago

Possible solution: Use float data type instead of int

rweigel commented 1 month ago

Which dataset has type int and fill of NaN? This is an error. An int data type should have an int fill value.

https://github.com/hapi-server/data-specification/blob/master/hapi-dev/HAPI-data-access-spec-dev.md#368-fill-details

tinsmcl1 commented 1 month ago

It's not that a dataset has type int and fill of NaN. It's that when merging two datasets we may introduce NaNs into the data e.g. when merging datasets with different times and parameters, not all parameter data will exist for every time. With that, an integer parameter in the hapi ndarray may end up having some NaN values which doesn't appear to be allowed by numpy.

I need to clarify - Are NaNs allowed as elements in a hapi ndarray or are those elements always set to the fill value?

rweigel commented 1 month ago

The spec says, " For integers, string fill values must correspond to an integer value that is small enough to fit into a 4-byte signed integer.", so they are technically not allowed, but I suspect there are cases where fill was set to NaN in the HAPI metadata for an integer type. It seems the verifier does not check this. This will probably cause hapiclient to throw an error - I have not encountered it, however.