Closed jr3cermak closed 1 year ago
Sample datasets: https://nasfish.fish.washington.edu/echotools/datasets/dbdreader/dbdreader_20230430.zip
I believe the glider in question is a Slocum G3. Will continue to investigate as well. This is a useful tool for climbing around the binary data looking at the data records: https://hexed.it/
Results from python supplied script:
$ python ./cmpData.py
===> ascii/unit_507_2022_078_0_0_sbd.dat
===> unit_507-2022-078-0-0
Result from merge_dba:
m_present_time m_altitude
3 1647787992.10013 52.7949
18 1647788315.34152 52.7949
206 1647789281.2258 94.1087
238 1647789440.33847 71.7375
263 1647789559.90067 56.3846
282 1647789649.58444 46.0085
300 1647789722.63565 37.4689
312 1647789778.15817 31.6313
322 1647789821.28613 27.6911
328 1647789855.35016 24.5446
334 1647789885.24399 21.6618
338 1647789906.44281 19.8181
693 1647793734.86829 94.3285
706 1647793897.18198 72.9194
715 1647794024.84744 58.9133
724 1647794123.18723 48.5177
733 1647794200.28311 40.5177
739 1647794260.27444 34.4115
745 1647794307.54941 28.9792
750 1647794346.17276 24.9158
754 1647794376.31052 21.8657
758 1647794398.061 20.1404
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
1647788315.3415222 52.7948723
1647789281.2257996 94.1086655
1647789440.3384705 71.7374878
1647789559.9006653 56.3846169
1647789649.5844421 46.0085487
1647789722.6356506 37.4688644
1647789778.1581726 31.6312580
1647789821.2861328 27.6910858
1647789855.3501587 24.5445671
1647789885.2439880 21.6617832
1647789906.4428101 19.8180714
1647793734.8682861 94.3284531
1647793897.1819763 72.9194107
1647794024.8474426 58.9133072
1647794123.1872253 48.5177040
1647794200.2831116 40.5177040
1647794260.2744446 34.4114761
1647794307.5494080 28.9792423
1647794346.1727600 24.9157505
1647794376.3105164 21.8656902
1647794398.0610046 20.1404152
===> ascii/unit_507_2022_078_0_1_sbd.dat
===> unit_507-2022-078-0-1
Result from merge_dba:
m_present_time m_altitude
3 1647797262.02112 20.1404
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
===> ascii/unit_507_2022_078_0_2_sbd.dat
===> unit_507-2022-078-0-2
Result from merge_dba:
m_present_time m_altitude
3 1647797523.63519 20.1404
269 1647799166.7489 93.7827
305 1647799337.60144 74.7497
332 1647799470.16937 61.5226
355 1647799577.30692 50.7631
375 1647799662.93506 43.1514
391 1647799732.08231 37.5751
405 1647799788.09076 32.1587
417 1647799835.50098 28.0134
426 1647799873.98682 24.8877
433 1647799904.10349 22.5629
439 1647799925.58719 21.2222
447 1647799947.7655 19.8462
802 1647803534.71719 13.6276
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
1647799166.7489014 93.7826614
1647799337.6014404 74.7496948
1647799470.1693726 61.5225868
1647799577.3069153 50.7631264
1647799662.9350586 43.1514053
1647799732.0823059 37.5750923
1647799788.0907593 32.1587296
1647799835.5009766 28.0134315
1647799873.9868164 24.8876686
1647799904.1034851 22.5628815
1647799925.5871887 21.2222214
1647799947.7655029 19.8461533
1647803534.7171936 13.6275949
===> ascii/unit_507_2022_078_0_3_sbd.dat
===> unit_507-2022-078-0-3
Result from merge_dba:
m_present_time m_altitude
3 1647805656.99963 13.6276
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
===> ascii/unit_507_2022_078_0_4_sbd.dat
===> unit_507-2022-078-0-4
Result from merge_dba:
m_present_time m_altitude
3 1647805913.8418 13.6276
310 1647807770.36847 94.5678
347 1647807945.22488 77.7473
378 1647808090.59744 65.5983
405 1647808210.89642 55.3773
426 1647808308.9747 47.5531
444 1647808390.28259 42.4994
459 1647808458.91168 38.1319
474 1647808519.4924 33.5287
483 1647808566.51901 28.5751
491 1647808605.02762 24.6654
499 1647808635.14502 21.4286
504 1647808656.79953 19.3919
812 1647811881.11328 94.1331
827 1647812051.53488 73.4225
838 1647812179.75671 59.16
846 1647812277.96017 48.0134
855 1647812355.0513 39.6093
862 1647812415.10129 33.8181
868 1647812462.21277 30.1575
874 1647812505.2868 26.9585
878 1647812539.61273 24.2271
883 1647812569.69446 21.7875
886 1647812591.33939 20.3626
889 1647812613.59735 19.6349
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
1647807770.3684692 94.5677643
1647807945.2248840 77.7472534
1647808090.5974426 65.5982895
1647808210.8964233 55.3772888
1647808308.9747009 47.5531120
1647808390.2825928 42.4993896
1647808458.9116821 38.1318665
1647808519.4924011 33.5286942
1647808566.5190125 28.5750923
1647808605.0276184 24.6654453
1647808635.1450195 21.4285717
1647808656.7995300 19.3919411
1647811881.1132812 94.1330872
1647812051.5348816 73.4224701
1647812179.7567139 59.1599503
1647812277.9601746 48.0134315
1647812355.0513000 39.6092796
1647812415.1012878 33.8180695
1647812462.2127686 30.1575089
1647812505.2868042 26.9584866
1647812539.6127319 24.2271061
1647812569.6944580 21.7875462
1647812591.3393860 20.3626366
1647812613.5973511 19.6349201
===> ascii/unit_507_2022_078_0_5_sbd.dat
===> unit_507-2022-078-0-5
Result from merge_dba:
m_present_time m_altitude
3 1647815396.73434 19.6349
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
===> ascii/unit_507_2022_078_0_6_sbd.dat
===> unit_507-2022-078-0-6
Result from merge_dba:
m_present_time m_altitude
3 1647815654.83646 19.6349
286 1647817406.83234 94.8938
321 1647817581.85333 79.1966
350 1647817726.98529 65.4371
372 1647817846.62964 55.663
392 1647817940.35114 47.6337
409 1647818021.61417 41.8694
426 1647818098.89392 36.707
437 1647818154.73715 31.779
446 1647818197.41885 27.6923
454 1647818232.05643 24.5665
460 1647818261.78415 21.293
464 1647818283.13773 19.873
783 1647821757.74164 94.6728
799 1647821928.4733 74.3297
811 1647822061.01889 60.4823
824 1647822168.23782 51.022
832 1647822253.58789 43.9597
840 1647822326.47488 38.0354
846 1647822386.16583 33.2357
852 1647822433.4245 29.5238
856 1647822471.92181 26.453
860 1647822506.22208 23.9096
864 1647822531.99493 21.9512
866 1647822553.21835 20.9048
869 1647822575.05814 20.3529
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
1647817406.8323364 94.8937759
1647817581.8533325 79.1965790
1647817726.9852905 65.4371185
1647817846.6296387 55.6630020
1647817940.3511353 47.6337013
1647818021.6141663 41.8693542
1647818098.8939209 36.7069588
1647818154.7371521 31.7789993
1647818197.4188538 27.6923084
1647818232.0564270 24.5665455
1647818261.7841492 21.2930412
1647818283.1377258 19.8730164
1647821757.7416382 94.6727753
1647821928.4732971 74.3296738
1647822061.0188904 60.4822960
1647822168.2378235 51.0219765
1647822253.5878906 43.9597054
1647822326.4748840 38.0354080
1647822386.1658325 33.2356529
1647822433.4244995 29.5238094
1647822471.9218140 26.4529915
1647822506.2220764 23.9096451
1647822531.9949341 21.9511604
1647822553.2183533 20.9047623
1647822575.0581360 20.3528690
===> ascii/unit_507_2022_078_0_7_sbd.dat
===> unit_507-2022-078-0-7
Result from merge_dba:
m_present_time m_altitude
3 1647825477.68988 20.3529
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
===> ascii/unit_507_2022_078_0_8_sbd.dat
===> unit_507-2022-078-0-8
Result from merge_dba:
m_present_time m_altitude
3 1647825745.16934 20.3529
286 1647827447.84875 94.2759
319 1647827618.26187 75.7827
347 1647827755.00137 62.0159
371 1647827866.345 52.6691
391 1647827956.66812 45.6886
408 1647828033.78882 39.8816
424 1647828098.60324 35.4396
437 1647828154.23065 31.1795
449 1647828197.57965 28.1831
459 1647828236.18674 25.4481
467 1647828266.46698 23.3175
474 1647828292.55502 21.409
480 1647828314.03809 20.0989
860 1647832184.2858 94.221
876 1647832359.24014 76.7314
890 1647832500.42477 64.0843
900 1647832615.9827 54.8486
909 1647832714.33966 48.5299
917 1647832795.50861 42.7045
925 1647832864.11362 37.917
931 1647832923.93713 33.7045
938 1647832975.67633 30.1197
945 1647833018.73175 26.3724
949 1647833053.03363 23.851
952 1647833078.86322 22.1734
956 1647833100.62231 21.1795
959 1647833122.40887 19.6838
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
1647827447.8487549 94.2759476
1647827618.2618713 75.7826614
1647827755.0013733 62.0158730
1647827866.3450012 52.6691093
1647827956.6681213 45.6886444
1647828033.7888184 39.8815613
1647828098.6032410 35.4395599
1647828154.2306519 31.1794872
1647828197.5796509 28.1831493
1647828236.1867371 25.4481068
1647828266.4669800 23.3174610
1647828292.5550232 21.4090347
1647828314.0380859 20.0989017
1647832184.2857971 94.2210007
1647832359.2401428 76.7313766
1647832500.4247742 64.0842514
1647832615.9826965 54.8485947
1647832714.3396606 48.5299149
1647832795.5086060 42.7045174
1647832864.1136169 37.9169731
1647832923.9371338 33.7045174
1647832975.6763306 30.1196575
1647833018.7317505 26.3724060
1647833053.0336304 23.8510380
1647833078.8632202 22.1733818
1647833100.6223145 21.1794872
1647833122.4088745 19.6837616
===> ascii/unit_507_2022_078_0_9_sbd.dat
===> unit_507-2022-078-0-9
Result from merge_dba:
m_present_time m_altitude
3 1647835760.89847 19.6838
Result from dbdreader, get_list m_altitude:
m_present_time m_altitude
Did some tracing into the C code, I am wondering if there is an initial mismatch:
static unsigned char read_known_cycle(FILE *fd)
{
// the first 2 bytes are:
// s Cycle Tag (this is an ASCII s char).
// a One byte integer.
// but just skip over them
int pos = ftell(fd);
fseek(fd, pos + 2, 0);
// followed by, the value we want to check for:
// 0x1234 Two byte integer.
// which is 4660
unsigned short two_byte_int;
fread((void*)(&two_byte_int), sizeof(two_byte_int), 1, fd);
//printf("two_byte_int : %d\n", two_byte_int);
// the next 12 bytes are:
// 123.456 Four byte float.
// 123456789.12345 Eight byte double.
// but by this point we already know the byte order, so just skip the bytes
pos = ftell(fd);
fseek(fd, pos + 13, 0);
The first three items are correct, detect and skip over the first two bytes as the start of the cycle head sa
and the next two bytes help determine word order. This last part says then skip 12 bytes, but the skip is pos + 13
not what I expect to be pos + 14
(12 bytes to skip + two byte integer)?
That is as far as I have traced at the moment. Using the binary snooper I can clearly see the first record of data in the sbd
that is currently skipped.
Hi @jr3cermak,
Thank you for doing such a thorough investigation and providing the test data and scripts. Yes, dbdreader skips the first line of data. On purpose.
What I noticed when starting coding dbdreader a long time ago, is that all parameters are set to as "UPDATED" in the first state bytes section of each file. I also noticed that usually the second entry was some considerable time later. My assessment is that it is very unlikely that all parameters are measured at the time of file creation. Nevertheless, they get all published in the dbd file, and I suspect they will take whatever value is in memory. Either these values are nonsense, or were measured some (long) time before. In either case these data points have no scientific value. So the first data line is skipped over.
As for the number of bytes to skip being 12, 13 or 14, 13 is the correct number. In earlier versions of dbdreader, I would skip 17 bytes, which I found out by trial and error, as only then the decoding of the state bytes would make sense. Also these bytes would always be the same. The Glider manual which described the data format, did not mention to skip these bytes, though. Later, with the arrival of G3 gliders, that use little endian byte order as opposed to the big endian of persistor based gliders, @erinaceous used these bytes to test the byte order, since these 17 bytes are composed of 12 34 as two bytes, and 4 bytes representing a float 123.456 and 8 bytes representing a double 123456789.12345. The next byte is always 'd' or 64 in hex. This makes the 17 bytes that I needed to skip initially.
So in summary, I don't think that this is an issue, unless it is your opinion that you require the first data line as well, even though it does not contain any useful information.
Thank you @smerckel for the confirmation that the first record is intentionally skipped at the moment.
Let me do some checking with my upstream processing to see what is typically done with the first record.
There might be a desire to have two additional options: (1) allow passing all values to reproduce the original slocum binary behavior; (2) at least pass along the timestamp with the remaining requested columns filled with nan or missing values.
@smerckel I disagree that the first line never has value and am of the opinion that it should be a optional choice to keep or drop the first line, but default to dropping. The initialization line is useful for diagnostic purposes more so than science purposes. I think it makes sense to almost always drop the first line of the science files, as pretty much all initialization values for the instruments are zero (although an exception may be something like the card data space used). However the flight files have some use of reading the initialization values, especially if someone ever needs to review data from only 1 segment (therefore only 1 dbd/sbd file) for diagnostics. Many sensors/variables only have a value in that first initialization line (e.g. m_science_on, m_why_started, u_alpha_system_clock_lags_gps, etc.) and not again for the rest of the file. Therefore it may be useful to see what values certain variables initialize with before starting the segment. I would strongly encourage you to make it an option to keep the initialization line.
As for the first 17 bytes, according to Dave Pingal's (of TWR) original binary reading python package (he called pyslocum
, but I don't think it was ever published anywhere), he used the first 17 bytes (or 16 plus a tag) to determine the endian-ness of the binary file. I believe this was specifically helpful for G3 vs G2 data across different platforms. Here is his check_binary
method that checks the endian-ness from a dbdfile
class:
def check_binary(self):
test_pat = self.ifile.read(16)
(tag, byte1, byte2, byte4, byte8) = struct.unpack('>cchfd', test_pat)
(_, _, byte2, _, _) = struct.unpack('<cchfd', test_pat)
if byte2 == 4660:
endian = '<'
else:
endian = '>'
return endian
and yes, I am aware that he has a similar line twice that overwrites byte2.
@s-pearce , thanks for the explanation. I can see the point of having the ability to extract the first line too. I have coded that now, and you can set the behaviour by the class variable SKIP_INITIAL_LINE for the DBD class. In this way, you can set the behaviour of DBD and MultiDBD by setting one variable once, and affects all future calls to the get*() methods.
The change is applied to the master branch for now. I made some other changes to bring uniformity in the return values of the get* methods. After updating the manual, I think the code can then be released as version 0.5.0.
Thank you @jr3cermak, @s-pearce for the feedback.
Ooh, could I recommend you make that an instance variable rather than a class-level variable?
example: Using dbdreader in multi-threaded code which is handling reading slocum data as it comes off of multiple different gliders, so getting .tbd and .sbd out of order and/or instantiating MultiDBDs for different gliders concurrently. I could see wanting to enable getting the initial line for the extra engineering info on one glider but not another, in parallel.
I think I would agree with @erinaceous. I just tried out the new MultiDBD
with two instances of the same set of sbd files. Changing the SKIP_INITIAL_LINE
variable in between creation of the 2 instances changes both instances to use the new value of SKIP_INITIAL_LINE
, unless I've already read out the variables I want from the first instances. Presumably because the instances exists as a buffered object. I think an instance variable might prevent this potentially unexpected behavior.
set_a = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR)
DBD.SKIP_INITIAL_LINE = False
set_b = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR)
ta, deptha = set_a.get("m_depth")
tb, depthb = set_b.get("m_depth")
len(ta) == len(tb)
Out[93]: True
but if I assign the variables before changing SKIP_INITIAL_LINE
:
set_a = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR)
ta, deptha = set_a.get("m_depth")
DBD.SKIP_INITIAL_LINE = False
set_b = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR)
tb, depthb = set_b.get("m_depth")
len(ta) == len(tb)
Out[107]: False
Also thanks for adding in these feature. It is much appreciated.
@s-pearce : the behaviour you describe was actually intentional. Setting DBD.SKIP_INITIAL_LINE sets how the DBD instance will treat the initial lines of the binary files to be read (until its value is changed again). I considered this more as a policy: set once, preferably at the top, and then process the dbd files.
I can see the point of @erinaceous too, although it is a bit hypothetical. Making SKIP_INITIAL_LINE an instance variable means that there is more fine grained control on how the dbd files are read, at the expense of more coding on the user's side.
I could add a keyword skip_initial_line to both the DBD and MultiDBD class constructors, which then sets the behaviour for all get() methods invoked for this instance. You could then also directly set the attribute during the life time of the instance. I would prefer this over making skip_initial_line a keyword to all get() methods.
The example above would then be something like:
set_a = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR)
set_b = MultiDBD("ce_382-2022-024-4-[0-7].sbd", cacheDir=LCACHEDIR, skip_initial_line=False)
ta, deptha = set_a.get("m_depth")
tb, depthb = set_b.get("m_depth")
len(ta) == len(tb) # => False
Perhaps one of you guys may have a better alternative for the keyword name?
Because the individual DBD instances are created in the constructor of MultiDBD, changing the behaviour of each DBD requires a new method to MultiDBD, something like set_skip_initial_line(boolean value).
The commit 9c0c949 is working perfectly for our needs.
I am looking into the possibility that dbdreader might be skipping the first data record. I am constructing a test package which I will share soon of 9 pairs of science and glider data segments. I will also share code which I am using to demonstrate the issue at hand.
In a nutshell, using the slocum linux binaries in converting a sdb and tbd file and then performing a merge reveals that the first record might be missing from the dbdreader decoded messages.
Here are the first couple of records as decoded using the slocum linux binaries:
Here are the first couple of records from dbdreader:
All as is should be, starting with the 2nd record of the slocum binary decoder (1st record of the dbdreader).
I will post a link soon to the dataset, cache files and code that I am working with. The general pattern seems to persist with any pair of science and glider files, so you should be able to reproduce with existing sample data.
The ultimate goal is to allow us replace the slocum binary tools with this library in its entirety. I just noticed this issue at this point and want to see if it was an issue or there might be a reason the first record might be skipped?