Streamlines processing workflow.

webbpinner commented 8 years ago

Ok this was a big one... I saw a todo list in testing.py where that code needed to be integrated into run.py (or at least called by run.py) I also saw where testing.py was dealing with a csv but then converting it to a dataframe. The web_viewer code also builds a dataframe from the csv to due the format translation and collect stats. All of these hex-->csv-->dataframe-->json conversions are really inefficient so I tweaked everything just work in dataframes. (I didn't touch testing.py).

Now when the raw data is initially parsed its returned as a dataframe (raw_df) That raw dataframe can be saved as a csv (via the new -r flag) and/or passed to a new version of the cnv handler (cnv_handler_1) which now returns a separate dataframe containing the scientific values (rare_df).

The web_viewer code has been tweaked to take this rare dataframe as it's input.

I've also added a -p flag for specifying an ini file to be used with the processing code. My hope was that by building a rare_df file and python object with the sensor data (rawConfig) it would simplify the integration process.

Taking this approach of moving directly to dataframes and then passing dataframes around should simplify adding additional steps in the future.

One caveats... this broke adding time and position to the converted data... will need to look at that tomorrow. The good news is that by converting straight to a dataframe these columns can be added as meaningful datatypes (datetime).

I hope this helps. -W

cschatzm commented 8 years ago

Awesome, that was indeed the next logical step. There are notable problems with filtering data in Pandas dataframe. Fortunately, there are readily available functions to convert dataframes to numpy.ndarray. I'll migrate testing.py to run.py and incorporate the converters tomorrow.

Joseph and I spoke about having two different scripts to manage data flow through. One that checks cruise directory files and automatically processes raw hex to time, pressure sequenced and finally processed data sets. The other to manage problematic station data via a command line option. Hopefully incorporating the -p flag option.

Thanks

On Tue, Nov 8, 2016 at 1:05 PM, Webb Pinner notifications@github.com wrote:

Ok this was a big one... I saw a todo list in testing.py where that code needed to be integrated into run.py (or at least called by run.py) I also saw where testing.py was dealing with a csv but then converting it to a dataframe. The web_viewer code also builds a dataframe from the csv to due the format translation and collect stats. All of these hex-->csv-->dataframe-->json conversions are really inefficient so I tweaked everything just work in dataframes. (I didn't touch testing.py).

Now when the raw data is initially parsed its returned as a dataframe (raw_df) That raw dataframe can be saved as a csv (via the new -r flag) and/or passed to a new version of the cnv handler (cnv_handler_1) which now returns a separate dataframe containing the scientific values (rare_df).

The web_viewer code has been tweaked to take this rare dataframe as it's input.

I've also added a -p flag for specifying an ini file to be used with the processing code. My hope was that by building a rare_df file and python object with the sensor data (rawConfig) it would simplify the integration process.

Taking this approach of moving directly to dataframes and then passing dataframes around should simplify adding additional steps in the future.

I hope this helps.

-W

You can view, comment on, or merge this pull request online at:

https://github.com/asx-/odf-ctd-proc/pull/3 Commit Summary

Streamlines processing workflow.

File Changes

M README https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-0 (23)

M converter_scaffolding.py https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-1 (275)

M run.py https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-2 (191)

M sbe_equations_dict.py https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-3 (1)

M sbe_reader.py https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-4 (38)

M web_viewer.py https://github.com/asx-/odf-ctd-proc/pull/3/files#diff-5 (36)

Patch Links:

https://github.com/asx-/odf-ctd-proc/pull/3.patch

https://github.com/asx-/odf-ctd-proc/pull/3.diff

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AVnpBaj2G9J0zaMXqQnuOMK2vfW_q9saks5q8OQbgaJpZM4Ks6Br .

webbpinner commented 8 years ago

A logical division of labor using multiple scripts makes a lot of sense from a support-ability standpoint. Also given the amount of progress made on bottle.py and testing.py leveraging .converted files I think my original idea was off-base.

So something like:

convertRaw.py [-help] [-debug] [-saveRaw] [-o outputDir] hexFile xmlconFile (currently run.py)
processBottles.py [-help] [-debug] [-o outputDir] convertedFile iniFile
processData.py [-help] [-debug] [-o outputDir] convertedFile iniFile

...and as I've stated before, this project belongs to you guys/gals. I'm very interested in leveraging it on multiple vessels/vehicles and thus trying to help where I can. However, if you ever need me to step aside just let me know. -W

webbpinner commented 8 years ago

So now it's pretty straightforward to interact with SBE files.

Run the following to create ./GS3601101_converted.csv: python run.py ./GS3601101.hex ./GS3601101.xmlcon

Use this code to import the data in that file into a python script:

import pandas as pd
import converter_scaffolding as cnv

df = cnv.importConvertedData('./GS3601101_converted.csv')

df is a dataframe. The dtypes should be correct for the boolean and datetime columns

asx- commented 8 years ago

So it's been a long night, let me get settled and then get to work merging this in :)

webbpinner commented 8 years ago

I’ve got another major update in the works that REALLY streamlines parsing raw data and exporting/importing converted data. Will make importing converted data into your scripts a 3-line affair (and even the datatypes are correct).

Should be posting to Github in about an hour… working on the documentation.

This is awesome stuff! Thanks for getting this project off the ground, I’ve been unsuccessfully trying to hack SBE .hex files for years.

Webb

=-------------------------------------------------= Webb Pinner Mobile: 401.749.9322 Email: webbpinner@gmail.com Website: http://www.oceandatarat.org

On Nov 9, 2016, at 12:01 PM, asx- notifications@github.com wrote:

So it's been a long night, let me get settled and then get to work merging this in :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3#issuecomment-259466675, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfvVCkxQyO-66gQFlebC1QF1qO_zIC3ks5q8fx_gaJpZM4Ks6Br.

webbpinner commented 8 years ago

Take a good hard look at these changes before accepting. There's a big architectural change with converter_scaffolding.py.

To better explain the changes I added two sample scripts: sampleImport.py - Sample Script for importing converted SBE Data

usage: sampleImport.py [-h] [-d] converted File

positional arguments:
  converted File  the converted data file to process

optional arguments:
  -h, --help      show this help message and exit
  -d, --debug     display debug messages

sampleConvert.py - Sample Script for converting raw SBE Data

usage: sampleConvert.py [-h] [-d] hex File XMLCON File

positional arguments:
  hex File     the .hex data file to process
  XMLCON File  the .XMLCON data file to process

optional arguments:
  -h, --help   show this help message and exit
  -d, --debug  display debug messages

-W

cschatzm commented 8 years ago

Amazing. I'm really happy with the rapid progress. I should state that I have been under the weather this week (flu) and have exams as well. Hopefully, I can resume collaborating Monday. Thanks for all the feedback Webb and I'm looking forward to playing with the new programs.

On Wed, Nov 9, 2016 at 10:27 AM, Webb Pinner notifications@github.com wrote:

Take a good hard look at these changes before accepting. There's a big architectural change with converter_scaffolding.py.

To better explain the changes I added two sample scripts: sampleImport.py - Sample Script for importing converted SBE Data

usage: sampleImport.py [-h] [-d] converted File

positional arguments: converted File the converted data file to process

optional arguments: -h, --help show this help message and exit -d, --debug display debug messages

sampleConvert.py - Sample Script for converting raw SBE Data

usage: sampleConvert.py [-h] [-d] hex File XMLCON File

positional arguments: hex File the .hex data file to process XMLCON File the .XMLCON data file to process

optional arguments: -h, --help show this help message and exit -d, --debug display debug messages

-W

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3#issuecomment-259487476, or mute the thread https://github.com/notifications/unsubscribe-auth/AVnpBQQSC7Q-v2POeGv8JLDtOfkMkAwGks5q8hCngaJpZM4Ks6Br .

asx- commented 8 years ago

I'm going to pull this in a little while with a few notes and a few changes:

The header from the _converted.csv file changes from three lines to one line, with type and units being joined by an underscore, ex: t1_C. I think it's a more elegant solution in one line, and this _converted.csv is an intermediate file, but it would be nice to see what our partners (ex: CCHDO) want and expect and try to minimize formats changing as we generate files. This is a general reminder that all of this is a big work-in-progress and things can change at any time.
The header changing means if your code depends on the three line header it don't work anymore, and will need to be changed. I'll alter bottle.py and bottle_lib.py to reflect the change of header and look through the rest of the code, but I can't guarantee I'll get everything.
Along the lines of CCHDO and units, all auxiliary sensors with the exception of oxygen is submitted as raw voltages (thanks Courtney, I didn't know that before). A lot of that is currently built into the converter already as it simply passes along values instead of doing anything with them which isn't ideal for you Webb but we'll put conversion methods in sbe_equations_dict as we get to each sensor. Speaking of which, that file probably needs a new name.
The conductivity equation should be outputting in mS/cm, not S/m right now. We'll need to reach some sort of consensus on that, but mS/cm is what we use internally. Don't want people to get confused looking at headers and scratching their heads when their numbers are off by an order of magnitude.
The architecture cleanup needed to happen at some point, thanks for doing that Webb. At some point the goal is to standardize everything on dataframes or numpy arrays and do away with all the converting to and from csv, and have dump to csv the option/duplicated data file instead of the default.

A verbal description of a roadmap is not a roadmap, so I'll try to put one together this weekend so you have something to look at Webb. I'll also open issues so we don't forget about the smaller things that may be forgotten, all of you should feel free to open issues as well.

PS I'll close this PR after merging it in later tonight unless anyone has a reason to keep it open.

webbpinner commented 8 years ago

Joseph - Sounds good. I suspected what I submitted would need to be changed given I have don't know the full scope of what you are trying to achieve. Now that there's a mechanism to ingest data from a .hex file to/from programs via dataframe or csv it should accelerate any downstream development.

I got the idea for the new header format from something I saw in testing.py... so thanks for that.

Things like supporting multiple converted csv formats should be easy enough to integrate via a flag or constant (i.e. --format default|CCHDO|etc). I'm happy to work on these core functionalities so if you need something please just ask.

Courtney - Good luck with your exams, hope you feel better.

somts / odf-ctd-proc

Streamlines processing workflow. #3

-W