terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Provide Opendap / THREDDS server for netcdf / hdf5 data #155

Closed dlebauer closed 7 years ago

dlebauer commented 8 years ago

Description

Provide access to netcdf data via THREDDS Server

First two files to post would be /projects/arpae/met/narr/all.nc and /projects/arpae/met/cruncep/all.nc (these could be provided within /projects/arpae/terraref/derived_data/ but they don't fit the 'site' concept - they are north american (narr) and global products (cruncep) products.

We should also provide access to the environmental and hyperspectral data via this endpoint.

Context

In issue #89 it was decided that it will be straightforward to deploy an opendap server on ROGER to provide users with access to netcdf data / subsetting.

The met data will be useful for researchers evaluating environmental influences and predicting biomass production in different regions.

We can add other sensor data streams here:

Ideally the workflow would be something like

  1. query site geometry from betydb
  2. use this information + time domain to subset met products

    Further Suggestions / Request for Feedback

First, estimate how difficult it will be to set this up, then we can prioritize.

ghost commented 7 years ago

@dlebauer - is this a priority for the V0 release?

ghost commented 7 years ago

@robkooper - this has been transferred to you

ghost commented 7 years ago

Already existing API clients for this

max-zilla commented 7 years ago

Other steps:

@robkooper @dlebauer let's review this issue and make sure we have the use case defined.

dlebauer commented 7 years ago

@max-zilla

what netCDF files do we want query-able?

We can start with the hyperspectral level 1 (reflectance) and level 2 (indices) files. Use case is 'compute plot level statistics (starting with mean).

We may also consider converting geotiff to netcdf (gdal_translate seems to do this). This would facilitate subsetting and standardize the geospatial query workflow?

how do we get necessary data from BETYdb to execute the query?

The easiest way to get plot boundaries from BETYdb is with the API call https://terraref.ncsa.illinois.edu/bety/api/beta/sites?key=9999999999999999999999999999999999999999. See API documentation and / or ask @gsrohde for how to query a specific subset by partial name matching (I think something like &sitename~Season+2)

Open questions @jterstriep, @yanliu-chn, @max-zilla:

gsrohde commented 7 years ago

@max-zilla The API documentation for querying is here: https://pecan.gitbooks.io/betydb-data-access/content/API/beta_API.html. See the section "Matching using regular expressions" for use of the "=~" operator.

dlebauer commented 7 years ago

@jterstriep could you please meet with @robkooper to flush out the steps required to implement this?

ghost commented 7 years ago

need use cases to proceed (organized by hierarchy, grouping needed?) organize by day

start with hyperspectral files.

THREDD server needs to be configured or Java script written first - 1 week work

dlebauer commented 7 years ago

@ashiklom could you define a few use cases?

ghost commented 7 years ago

@jterstriep are there other options? is there a new THREDDS version available? @czender?

dlebauer commented 7 years ago

THREDDS 4.6.8 was released Jan 9 2017 https://github.com/Unidata/thredds/releases/tag/v4.6.8

More info http://www.unidata.ucar.edu/software/thredds/current/tds/TDS.html

ashiklom commented 7 years ago

Off the top of my head...

dlebauer commented 7 years ago

@robkooper and @jterstriep could you please provide an ETA, convert this to an epic and create smaller issues if necessary?

robkooper commented 7 years ago

Thredds is running, not configured yet https://terraref.ncsa.illinois.edu/thredds/

dlebauer commented 7 years ago

Awesome! If it's not a (big) distraction could get up a /samples/ folder with a VNIR dataset that I could demo next week? On Thu, Feb 2, 2017 at 10:51 AM Rob Kooper notifications@github.com wrote:

Thredds is running, not configured yet https://terraref.ncsa.illinois.edu/thredds/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/155#issuecomment-277013722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5z89-jKpVOTEAK9j9QsQ5srNvYMJks5rYgmYgaJpZM4JtNse .

dlebauer commented 7 years ago

FYI One use case is remote visualization. I'm going to check it out today https://www.giss.nasa.gov/tools/panoply/

dlebauer commented 7 years ago

@robkooper does this need to be broken down into smaller issues?

ghost commented 7 years ago

no. rob

dlebauer commented 7 years ago

Problem

There is no easy way to update the server as new files come in.

Rob can you please contact the developers?

dlebauer commented 7 years ago

@robkooper can you start with something that is static by the end of April?

ghost commented 7 years ago

use THREDDS 4.6, not 5.0.

Rob is still working on this.

max-zilla commented 7 years ago

@robkooper has been making PEcAn VM hopefully this week, then talk to @jdmaloney to get good way to list .nc files on Roger.

max-zilla commented 7 years ago

@robkooper and @jdmaloney should discuss.

robkooper commented 7 years ago

Following script is run every day at midnight, the list of files is created by JD

See https://terraref.ncsa.illinois.edu/thredds/catalog.html

#!/bin/bash

cat << EOF
<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Server Default Catalog : You must change this to fit your server!"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
           http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd">

  <service name="all" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/" />
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/" />
    <!--service name="wcs" serviceType="WCS" base="/thredds/wcs/" /-->
    <!--service name="wms" serviceType="WMS" base="/thredds/wms/" /-->
    <service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/" />
  </service>

  <datasetRoot path="uamac" location="/media/roger/sites/ua-mac/Level_1/hyperspectral"/>

  <dataset name="TERRA" ID="TERRA">

EOF

# INPUTS
echo '    <dataset name="UAMac" ID="UAMac">'
IFS=$'\n'
LAST_DATE=""
LAST_TIME=""
sort /media/roger/sites/ua-mac/Level_1/hyperspectral/nc_files | while read X; do
    # remove leading whitespace, and extract information
    X="${X//[[:space:]]/}"
    X="${X:2}"

    DATE=$( echo "$X" | cut -d "/" -f1 )
    if [ "$DATE" != "$LAST_DATE" ]; then
      if [ "$LAST_DATE" != "" ]; then
        echo '        </dataset>'
        echo '      </dataset>'
      fi
      LAST_TIME=""
      LAST_DATE="$DATE"
      echo "      <dataset name=\"${DATE}\" ID=\"${DATE}\">"
    fi

    TIME=$( echo "$X" | cut -d "/" -f2)
    if [ "$TIME" != "$LAST_TIME" ]; then
      if [ "$LAST_TIME" != "" ]; then
        echo '        </dataset>'
      fi
      LAST_TIME="$TIME"
      NAME=$( echo "$TIME" | cut -d "_" -f3 | tr "-" ":" )
      echo "        <dataset name=\"${NAME}\" ID=\"${TIME}\">"
    fi

    NAME=$( echo "$X" | cut -d "/" -f3 )
    echo "          <dataset name=\"${NAME}\" ID=\"${X}\" urlPath=\"uamac/${X}\" serviceName=\"all\">"
    echo '          </dataset>'
done
echo '        </dataset>'
echo '      </dataset>'
echo '    </dataset>'

# FOOTER
echo '  </dataset>'
echo '</catalog>'