pyoceans / python-ctd

Tools to load hydrographic data into pandas DataFrame
https://pyoceans.github.io/python-ctd/
BSD 3-Clause "New" or "Revised" License
54 stars 35 forks source link

Parse hexfile #36

Closed ocefpaf closed 5 years ago

ocefpaf commented 6 years ago

Add capabilities to load directly from the seabird HEX file.

richardsc commented 6 years ago

Interesting to see this pop up here, as @dankelley have just been looking at this for the oce package (in R). In fact, Dan made a sandbox repo to play around with it, but I guess it's private so I can't post the link. He could add you to that, or I don't really know why it isn't public so maybe opening it up is easier.

Anyway, we discussed this a fair bit, and the big hurdle that I see is that while the HEX file contains the raw data (as recorded by the sensors), it has to be processed in conjunction with the CON (or for newer instruments XMLCON) files which contain all the sensor calibration coefficients, and then applying the correct formulae based on the specific sensor, etc etc. To my knowledge there is no published file format for CON/XMLCON files (though obviously the latter are just xml), and there is not definitive resource for the calibration formulae (other than digging through manuals, etc).

While it would be nice to be able to read raw files without using the SBE software (m$oft windoze only), I'm skeptical that it's really worth the work.

Just my $0.02, though :smile:

dankelley commented 6 years ago

I've got my R code working with xmlcon files (well, a single file in my possession) but am finding it hard to decode temperature in a test file (although I can decode conductivity and pressure very well).

My repo is not public because it contains two PNG files that explain the hex format, and I don't know where I got them so I don't know if they are public or some deep, dark SBE secret.

ocefpaf commented 6 years ago

Thanks @richardsc!

The idea is not to read the most raw hex file and apply the calibration from the CON/XMLCON files, but a pre-processed hex. The reason is b/c even though most people save it to cnv after the pre-processing step, some do save it in hex and that one is virtually the same as the cnv.

I've got my R code working with xmlcon

Wow! That is awesome!! I'll keep an eye at the oce package and see how you do it when you are able to open it. Ideally we could try to write something in C/C++ so both R and Python could use it.

dankelley commented 6 years ago

@ocefpaf do you have docs on the hex format? From what I've read (in various SBE manuals) there are several variants, and they encode the data differently. But I don't know how, just given a file, to know the encoding that was used. The xlmcon file seems to be general, i.e. not tied to a particular .hex file, and therefore it seems necessary to find a way to determine which format of hex it is, just from the hex file itself (maybe by counting characters on the first line, etc)

ocefpaf commented 6 years ago

@ocefpaf do you have docs on the hex format?

Unfortunately no. I do know some people that has some knowledge and my goal is to ask them for help.

From what I've read (in various SBE manuals) there are several variants, and they encode the data differently. But I don't know how, just given a file, to know the encoding that was used.

Yep. That is a big issue. There is an old matlab library that deal with some of those variants, I'll try to find it and xref here.

richardsc commented 6 years ago

The idea is not to read the most raw hex file and apply the calibration from the CON/XMLCON files, but a pre-processed hex. The reason is b/c even though most people save it to cnv after the pre-processing step, some do save it in hex and that one is virtually the same as the cnv.

Ok, that is a different use case than what @dankelley and I have been looking at, and does sound easier, provided you can be confident that the file format is the same.

Dan, to be clear what Philippe is referring to here is the ability to save a HEX file from the SBE software instead of a CNV. I have never encountered this usage in the past, and I'm still not convinced putting effort into trying to handle it generally is really worth it. At the worst someone who has a Windows computer (or even a windows VM or Wine) can run the SBE software (in batch mode if there are lots of files) to convert to a CNV.

ocefpaf commented 6 years ago

I'm still not convinced putting effort into trying to handle it generally is really worth

I kind of agree.

I have never encountered this usage in the past

I encounter one use case for that in my previous work where we had tons of pre-processed hex files and no Windows machine or SBE software to convert them to CNV.

dankelley commented 6 years ago

I tried using "wine" on my macos machine, and it seems to let me run the SBE software without difficulty. And that software is available (closed source) for zero cost from seabird. Therefore, as a practical matter, there is no need to try to write code to handle .hex files. There are several advantages to using the seabird software.

  1. It saves effort in reading SBE documentation for various instruments, because different instruments seem to have different systems for encoding electrical signals as hex characters.

  2. It saves effort in decoding .xmlcon files. (Just forget about .con files, for which I've never seen documentation.) Although you can read .xmlcon files because xml is self-describing, this doesn't tell you what formulae to apply to go from engineering units to science units. For example, there are several formulae that have been proposed to deal with oxygen, and it seems that the seabird software lets the operator choose between them. But some of these formulae are in publications that I cannot get online (old Deep-Sea Research papers are unavailable to me, with my university's subscription). Even worse, I am getting the impression that seabird has devised an oxygen formulae of their own, and that the software provides flags to tailor this formula (to handle a time-dependent effect).

I have made quite a lot of progress by fiddling around with a particular test case, but I have given up on decoding the oxygen to a value that matches what is in the .cnv file. However, that .cnv file was developed by an operator who was dealing with software that would recognize a particular instrument, and was making choices on software flags that were in the convention of a scientific group with experience working with that instrument, probably in a similar oceanographic domain.

My conclusion is that there is no point in trying to write code to handle .hex files. It would be nice to work with only open-source software and to be free of mswindows ugliness, but even if the code seemed to work on the .hex data, how could you know whether to trust it?

I've spent time this weekend going down this rabbit hole, and I advise just walking by.

ocefpaf commented 6 years ago

Thanks for sharing the investigation on the HEX format. This is definitely not a priority as I only had use for this once and it is clearly something that is not worth pursuing, unless someone has tons of free time and is willing to suffer the messy seabird formats :smile:

ocefpaf commented 5 years ago

Closing this for now. I don't really plan to dig into the hex format.

paleolimbot commented 3 years ago

I agree that it's a rabbit hole not worth pursuing...as googling lead me here, I thought I'd post the limited headway I've made!

https://gist.github.com/paleolimbot/2200742882f125a1afcf0bd9fbca29a6#gistcomment-3540247