smerckel / dbdreader

A reader for binary data files created by Slocum ocean gliders (AUVs)
GNU General Public License v3.0
17 stars 14 forks source link

field m_time_til_wpt "inf" values ignored for dbd files #8

Closed jr3cermak closed 2 years ago

jr3cermak commented 2 years ago

I discovered a small edge case using the dbdreader. The problem is confined to a single field: m_time_til_wpt. It may apply to any field where the value is infinite (inf). The result is the read data field is short a few values depending on the number if "inf" values encountered. The existing dbd file can be used to demonstrate the bug (amadeus-2014-204-05-000.dbd).

Using the slocum binary (dbd2asc) the first 24 values are:

m_present_time m_time_til_wpt
timestamp s
8 4
1406221416.56702 0
1406221444.0452 NaN
1406221448.99835 NaN
1406221453.9433 NaN
1406221462.78418 NaN
1406221467.7373 NaN
1406221472.7084 NaN
1406221477.66046 NaN
1406221482.61267 NaN
1406221487.82343 93665.8
1406221492.99747 51991
1406221497.87772 37061.5
1406221502.74731 inf
1406221507.86731 91327.3
1406221513.19812 209718
1406221518.0636 52789.6
1406221522.92633 inf
1406221527.80518 inf
1406221532.69302 115338
1406221537.30292 367537
1406221541.79474 194995
1406221546.28998 65410.9
1406221550.77869 47719.9
1406221555.27179 61500.2

Using the reader:

$ python dbd_test_3.py 
m_present_time m_time_til_wpt
1406221416.567017 0.000000
1406221444.045197 nan
1406221448.998352 nan
1406221453.943298 nan
1406221462.784180 nan
1406221467.737305 nan
1406221472.708405 nan
1406221477.660461 nan
1406221482.612671 nan
1406221487.823425 93665.804688
1406221492.997467 51991.015625
1406221497.877716 37061.460938
1406221502.747314 91327.265625
1406221507.867310 209717.953125
1406221513.198120 52789.554688
1406221518.063599 115338.468750
1406221522.926331 367537.343750
1406221527.805176 194994.812500
1406221532.693024 65410.914062
1406221537.302917 47719.914062
1406221541.794739 61500.207031
1406221546.289978 66404.945312
1406221550.778687 48512.917969
1406221555.271790 33195.523438
1406221559.755737 67394.851562

When an "inf" is encountered, the data record is tossed and the next valid data record is put into the time position of the skipped "inf" data record. This will systematically shift the data for this column.

Of all the glider deployment data we have at this point, this is the only field where I encounter this issue.

Here is the python script to reproduce the issue: dbd_test_3.py

import os, sys

import dbdreader

'''
As seen via dbd2asc (Slocum binaries)

$ bin/dbd2asc -c ~/src/dbdreader/data/cac ~/src/dbdreader/data/amadeus-2014-204-05-000.dbd | bin/dba_sensor_filter m_present_time m_time_til_wpt
dbd_label: DBD_ASC(dinkum_binary_data_ascii)file
encoding_ver: 2
num_ascii_tags: 14
all_sensors: 0
filename: amadeus-2014-204-5-0-sf
the8x3_filename: 07160000
filename_extension: dbd
filename_label: amadeus-2014-204-5-0-dbd(07160000)
mission_name: MICRO.MI
fileopen_time: Thu_Jul_24_17:03:02_2014
sensors_per_cycle: 2
num_label_lines: 3
num_segments: 1
segment_filename_0: amadeus-2014-204-5-0
m_present_time m_time_til_wpt 
timestamp s 
8 4 
1406221416.56702 0 
1406221444.0452 NaN 
1406221448.99835 NaN 
1406221453.9433 NaN 
1406221462.78418 NaN 
1406221467.7373 NaN 
1406221472.7084 NaN 
1406221477.66046 NaN 
1406221482.61267 NaN 
1406221487.82343 93665.8 
1406221492.99747 51991 
1406221497.87772 37061.5 
1406221502.74731 inf 
1406221507.86731 91327.3 
1406221513.19812 209718 
1406221518.0636 52789.6 
1406221522.92633 inf 
1406221527.80518 inf 
1406221532.69302 115338 
1406221537.30292 367537 
1406221541.79474 194995 
1406221546.28998 65410.9 
1406221550.77869 47719.9 
1406221555.27179 61500.2 
'''

# Read using dbdreader

dbdFp = dbdreader.DBD("/home/cermak/src/dbdreader/data/amadeus-2014-204-05-000.dbd", cacheDir="/home/cermak/src/dbdreader/data/cac")

# Read all fields

dbdData = dbdFp.get(*dbdFp.parameterNames, return_nans=True)

# Show first 24 rows of m_present_time m_time_til_wpt

tIndex = dbdFp.parameterNames.index('m_present_time')
fIndex = dbdFp.parameterNames.index('m_time_til_wpt')

print("m_present_time m_time_til_wpt")
for r in range(0,25):
    print("%f %f" % (dbdData[tIndex][1][r], dbdData[fIndex][1][r]))

dbdFp.close()
smerckel commented 2 years ago

Thanks jr3cermak for spotting this bug. Now dbdreader returns inf values for all inf values in the binary file.

Also, your script revealed another bug: if all variables are read using the .get() method, then the time vectors got messed up. In your example script you got around that to extract the time vector from the "values" vector of the (times, values) tuple for m_present_time. This is now fixed as well.