iosonobert / pIMOS

1 stars 3 forks source link

pIMOS

UWA Ocean Dynamics focussed python coding of IMOS tools. Click here for the IMOS Wiki.

The system consists of:

  1. A Deployment Database
    1. IMOS database is in MS Access
    2. The pIMOS will be in postgreSQL. It will be inspired by the IMOS database, but no requirement for a matching API/structure.
  2. Instrument specific NetCDF templates that are 'as CF Compliant as possible'.
    1. Will only code instruments used by UWA Ocean Dynamics, not coding up all of the IMOS instruments [yet]
  3. Parsers for each instrument file into compliant
    1. pIMOS contains a series of classes and superclasses which wrap xarray datasets and handle all of the conventions and metadata tracking
  4. Pre-processing
    1. IMOS call things like bin mapping and coordinate transforms 'pre-processing'
    2. This will all exist in pIMOS [so far as it is relevant to UWA instruments].
      1. The xarray Dataset wrapper will handle tracking of any pre-processing in the Comments global/variable attributes of the netCDF dataset.
      2. Ultimate goal to have the terminology and method nams as close to the IMOS terminology as possible.
  5. Quality Control
    1. IMOS outlines many QC procedures
    2. pIMOS will implement all that are relevant to UWA instruments.
      1. The xarray Dataset wrapper will handle tracking of any QC in the Comments global/variable attributes of the netCDF dataset.
      2. Ultimate goal to have the terminology and method nams as close to the IMOS terminology as possible.

**Initial focus will be on moored instruments. Extend to profiling [Solander CTD and VMP] when time allows.

Examples

General examples of processing files and the features of the package are found in the Notebooks folder.

Examples of archiving entire experiments are found in the Experiments folder, e.g.:

Database

Here is a draft notebook for generating a postgres database and adding some data. Ignore all the django stuff in that folder - that's another rabbit hole we don't need right now.

Python code

Code falls into 5 categories:

  1. Raw file readers
  2. The actual zutils.xrwrap object [common for ALL* instruments]
    • This class controls all the QAQC codes CF conventions etc.
    • Each instrument will naturally inherit from this class, adding unique features if necessary
  3. Code to parse from various data formats into the common data object
  4. 'Pre-processing' code
  5. QC code
  6. External processing libraries

1. Raw file readers

Where possible, these should read from binary files and avoid complex dependencies or other human interference. Dependancies should be reduced where possible. If code is borrowed, acknowledge it. Dependence on Dolfyn for exmple is not ideal as Kilcher is always changing his API, but the author must be acknowledged.

** Profiling instruments [CTD/VMP] are only partially done, and will be the lowest priority. These will be added once the rest of the system is complete

2/3. Parse into xrwrap object

Here is a basic intro to the xarray wrapper and how to interface with the underlying NetCDF file.

Instrument specific examples are covered in the following notebooks:

Only the Seabird example runs through every file [for KISSME] on a loop. The others are just individual examples.

4/5. Preprocessing/QC code

To the extent that this exists it is all in those notebooks above. We have:

Down the line we should add:

** NOTE. A point where we may differ from IMOS. We will often 'replace' data removed in QC, rather than just flagging it. Need to think how this is handled. Maybe we should have a

  1. 'modified' flag, or;
  2. Just add the comment to the nc that the change was made, but not actually flag exactly which variables were changed.

6. External processing code

Other repos exist for actual calculations, and these will be appropriately linked to this repo with examples.

Process Levels

Another thing IMOS defines are Process(ing) Levels. In a nutshell, this is how much processing has been performed, and the levels are:

We must decide:

  1. Whether we agree with these designations for our purposes [e.g. I'm not sure how you can call something QCd until there has been some attempt to interpret it]
  2. How far I'm personally intended to take this archiving
  3. What to do, going forward, with everyone's Derived & Interpreted Products.

Dependencies

Current state of play:

Known Issues

Making a list here rather than using the issues register in case I need to kill this repo and recreate.