somts / odf-ctd-proc

ODF CTD processing c. 2016
BSD 3-Clause "New" or "Revised" License
8 stars 7 forks source link

Streamlines processing workflow. #3

Closed webbpinner closed 8 years ago

webbpinner commented 8 years ago

Ok this was a big one... I saw a todo list in testing.py where that code needed to be integrated into run.py (or at least called by run.py) I also saw where testing.py was dealing with a csv but then converting it to a dataframe. The web_viewer code also builds a dataframe from the csv to due the format translation and collect stats. All of these hex-->csv-->dataframe-->json conversions are really inefficient so I tweaked everything just work in dataframes. (I didn't touch testing.py).

Now when the raw data is initially parsed its returned as a dataframe (raw_df) That raw dataframe can be saved as a csv (via the new -r flag) and/or passed to a new version of the cnv handler (cnv_handler_1) which now returns a separate dataframe containing the scientific values (rare_df).

The web_viewer code has been tweaked to take this rare dataframe as it's input.

I've also added a -p flag for specifying an ini file to be used with the processing code. My hope was that by building a rare_df file and python object with the sensor data (rawConfig) it would simplify the integration process.

Taking this approach of moving directly to dataframes and then passing dataframes around should simplify adding additional steps in the future.

One caveats... this broke adding time and position to the converted data... will need to look at that tomorrow. The good news is that by converting straight to a dataframe these columns can be added as meaningful datatypes (datetime).

I hope this helps. -W

cschatzm commented 8 years ago

Awesome, that was indeed the next logical step. There are notable problems with filtering data in Pandas dataframe. Fortunately, there are readily available functions to convert dataframes to numpy.ndarray. I'll migrate testing.py to run.py and incorporate the converters tomorrow.

Joseph and I spoke about having two different scripts to manage data flow through. One that checks cruise directory files and automatically processes raw hex to time, pressure sequenced and finally processed data sets. The other to manage problematic station data via a command line option. Hopefully incorporating the -p flag option.

Thanks

On Tue, Nov 8, 2016 at 1:05 PM, Webb Pinner notifications@github.com wrote:

Ok this was a big one... I saw a todo list in testing.py where that code needed to be integrated into run.py (or at least called by run.py) I also saw where testing.py was dealing with a csv but then converting it to a dataframe. The web_viewer code also builds a dataframe from the csv to due the format translation and collect stats. All of these hex-->csv-->dataframe-->json conversions are really inefficient so I tweaked everything just work in dataframes. (I didn't touch testing.py).

Now when the raw data is initially parsed its returned as a dataframe (raw_df) That raw dataframe can be saved as a csv (via the new -r flag) and/or passed to a new version of the cnv handler (cnv_handler_1) which now returns a separate dataframe containing the scientific values (rare_df).

The web_viewer code has been tweaked to take this rare dataframe as it's input.

I've also added a -p flag for specifying an ini file to be used with the processing code. My hope was that by building a rare_df file and python object with the sensor data (rawConfig) it would simplify the integration process.

Taking this approach of moving directly to dataframes and then passing dataframes around should simplify adding additional steps in the future.

I hope this helps.

-W

You can view, comment on, or merge this pull request online at:

https://github.com/asx-/odf-ctd-proc/pull/3 Commit Summary

  • Streamlines processing workflow.

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AVnpBaj2G9J0zaMXqQnuOMK2vfW_q9saks5q8OQbgaJpZM4Ks6Br .

webbpinner commented 8 years ago

A logical division of labor using multiple scripts makes a lot of sense from a support-ability standpoint. Also given the amount of progress made on bottle.py and testing.py leveraging .converted files I think my original idea was off-base.

So something like:

...and as I've stated before, this project belongs to you guys/gals. I'm very interested in leveraging it on multiple vessels/vehicles and thus trying to help where I can. However, if you ever need me to step aside just let me know. -W

webbpinner commented 8 years ago

So now it's pretty straightforward to interact with SBE files.

Run the following to create ./GS3601101_converted.csv: python run.py ./GS3601101.hex ./GS3601101.xmlcon

Use this code to import the data in that file into a python script:

import pandas as pd
import converter_scaffolding as cnv

df = cnv.importConvertedData('./GS3601101_converted.csv')

df is a dataframe. The dtypes should be correct for the boolean and datetime columns

asx- commented 8 years ago

So it's been a long night, let me get settled and then get to work merging this in :)

webbpinner commented 8 years ago

I’ve got another major update in the works that REALLY streamlines parsing raw data and exporting/importing converted data. Will make importing converted data into your scripts a 3-line affair (and even the datatypes are correct).

Should be posting to Github in about an hour… working on the documentation.

This is awesome stuff! Thanks for getting this project off the ground, I’ve been unsuccessfully trying to hack SBE .hex files for years.

=-------------------------------------------------= Webb Pinner Mobile: 401.749.9322 Email: webbpinner@gmail.com Website: http://www.oceandatarat.org

On Nov 9, 2016, at 12:01 PM, asx- notifications@github.com wrote:

So it's been a long night, let me get settled and then get to work merging this in :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3#issuecomment-259466675, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfvVCkxQyO-66gQFlebC1QF1qO_zIC3ks5q8fx_gaJpZM4Ks6Br.

webbpinner commented 8 years ago

Take a good hard look at these changes before accepting. There's a big architectural change with converter_scaffolding.py.

To better explain the changes I added two sample scripts: sampleImport.py - Sample Script for importing converted SBE Data

usage: sampleImport.py [-h] [-d] converted File

positional arguments:
  converted File  the converted data file to process

optional arguments:
  -h, --help      show this help message and exit
  -d, --debug     display debug messages

sampleConvert.py - Sample Script for converting raw SBE Data

usage: sampleConvert.py [-h] [-d] hex File XMLCON File

positional arguments:
  hex File     the .hex data file to process
  XMLCON File  the .XMLCON data file to process

optional arguments:
  -h, --help   show this help message and exit
  -d, --debug  display debug messages

-W

cschatzm commented 8 years ago

Amazing. I'm really happy with the rapid progress. I should state that I have been under the weather this week (flu) and have exams as well. Hopefully, I can resume collaborating Monday. Thanks for all the feedback Webb and I'm looking forward to playing with the new programs.

On Wed, Nov 9, 2016 at 10:27 AM, Webb Pinner notifications@github.com wrote:

Take a good hard look at these changes before accepting. There's a big architectural change with converter_scaffolding.py.

To better explain the changes I added two sample scripts: sampleImport.py - Sample Script for importing converted SBE Data

usage: sampleImport.py [-h] [-d] converted File

positional arguments: converted File the converted data file to process

optional arguments: -h, --help show this help message and exit -d, --debug display debug messages

sampleConvert.py - Sample Script for converting raw SBE Data

usage: sampleConvert.py [-h] [-d] hex File XMLCON File

positional arguments: hex File the .hex data file to process XMLCON File the .XMLCON data file to process

optional arguments: -h, --help show this help message and exit -d, --debug display debug messages

-W

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/asx-/odf-ctd-proc/pull/3#issuecomment-259487476, or mute the thread https://github.com/notifications/unsubscribe-auth/AVnpBQQSC7Q-v2POeGv8JLDtOfkMkAwGks5q8hCngaJpZM4Ks6Br .

asx- commented 8 years ago

I'm going to pull this in a little while with a few notes and a few changes:

A verbal description of a roadmap is not a roadmap, so I'll try to put one together this weekend so you have something to look at Webb. I'll also open issues so we don't forget about the smaller things that may be forgotten, all of you should feel free to open issues as well.

PS I'll close this PR after merging it in later tonight unless anyone has a reason to keep it open.

webbpinner commented 8 years ago

Joseph - Sounds good. I suspected what I submitted would need to be changed given I have don't know the full scope of what you are trying to achieve. Now that there's a mechanism to ingest data from a .hex file to/from programs via dataframe or csv it should accelerate any downstream development.

I got the idea for the new header format from something I saw in testing.py... so thanks for that.

Things like supporting multiple converted csv formats should be easy enough to integrate via a flag or constant (i.e. --format default|CCHDO|etc). I'm happy to work on these core functionalities so if you need something please just ask.

Courtney - Good luck with your exams, hope you feel better.