terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
21 stars 13 forks source link

Extractors for meteorological data #156

Closed dlebauer closed 7 years ago

dlebauer commented 8 years ago

Description

We have one script to process the environmental logger data.

@robkooper Can we use BrownDog / PEcAn infrastructure?

How to upload these files? They are small, not necessarily worth setting up Globus endpoint just for these, if they can be downloaded via FTP.


(Appended below the task list and useful information)

Tasks

{
   "WindDir":{
      "unit":"degrees",
      "sample_method":"Smp"
   },
   "PAR_ref":{
      "unit":"umol/s/m^2",
      "sample_method":"Smp"
   },
   "BattV":{
      "unit":"Volts",
      "sample_method":"Smp"
   },
   "TIMESTAMP":{
      "unit":"TS",
      "sample_method":""
   },
   "Rain_mm_Tot":{
      "unit":"mm",
      "sample_method":"Tot"
   },
   "Pyro":{
      "unit":"W/m^2",
      "sample_method":"Smp"
   },
   "RECORD":{
      "unit":"RN",
      "sample_method":""
   },
   "AirTC":{
      "unit":"Deg C",
      "sample_method":"Smp"
   },
   "WS_ms":{
      "unit":"meters/second",
      "sample_method":"Smp"
   },
   "RH":{
      "unit":"%",
      "sample_method":"Smp"
   },
   "PTemp_C":{
      "unit":"Deg C",
      "sample_method":"Smp"
   }
}
Zodiase commented 7 years ago

I'm a bit confused about the goal of this issue.

From the description I see these independent threads:

  1. Upload met data
  2. Process uploaded met data
  3. Provide met data on Roger

(1) doesn't seem to be what an extractor should be responsible for; how is the extractor going to be triggered?

For (2), what are the requirements for the processing?

How is (3) related to extractors?

I'm not sure how and where do I start. Any pointers?

dlebauer commented 7 years ago

For MAC weather station

  1. csv files (*.dat) are on roger here: /projects/arpae/terraref/sites/ua-mac/raw_data/weather
    • variable names and units are on the first row see terraref/reference-data#48
    • there is one folder per day with 24 files (1/h)
    • extractor should be launched when folder has 24 files
  2. An example of one dataset is https://terraref.ncsa.illinois.edu/clowder/datasets/57e115724f0cb775be69a949
  3. convert to json and instert into Clowder PostGIS sensor database (ask @caicai89- and @robkooper for details)

The API is documented here: https://terraref.ncsa.illinois.edu/clowder/assets/docs/api/index.html#!/datasets/addMetadata

But ask @max-zilla, @robkooper and @caicai89- for details.

The file format and variable names / units should follow specifications for PEcAn here: https://pecan.gitbooks.io/pecan-documentation/content/developers_guide/Adding-an-Input-Converter.html

dlebauer commented 7 years ago

For schema see #130

Roughly, the schema looks like:

clowder postgis schema-2

dlebauer commented 7 years ago

Here is an example of a record from one time point from here: https://greatlakesmonitoring.org/clowder/api/geostreams/datapoints?geocode=40.4868888889%2C-84.4817222222%2C0&since=2008-09-22+05%3A00%3A00&until=2014-07-03+19%3A00%3A00&format=json:

{
    id: 1863734,
    created: "2014-11-04T00:48:22Z",
    start_time: "2008-09-22T10:00:00Z",
    end_time: "2008-09-22T10:00:00Z",
    properties: {
        source: "http://www.heidelberg.edu/sites/default/files/dsmith/files/ChickasawData.xlsx",
        srp - load: 0.4982,
        Silica,
        mg / L: 9.09,
        Sulfate,
        mg / L: 261.2,
        nitrogen - load: 0.03,
        Chloride,
        mg / L: 268.6,
        phosphorus - load: 0.684,
        SS,
        mg / L(suspended solids): 12.3,
        TKN,
        mg / L(Total Kjeldahl nitrogen): 1.193
    },
    type: "Feature",
    geometry: {
        type: "Point",
        coordinates: [-84.4817222222,
            40.4868888889,
            0
        ]
    },
    stream_id: "7263",
    sensor_id: "899",
    sensor_name: "Chickasaw"
},
dlebauer commented 7 years ago

Here is the geostreams schema geostream.sql.txt

robkooper commented 7 years ago

Just keep in mind you do not have access to the database, all operations have to be done through the API.

We discussed this in the past and the thinking is to have sites represented as sensors in clowder (ua-mac, ksu, etc). Then have each sensor represented as a stream (VNIR, MET, stereo) and finally have each dataset, or in this case the actual values be represented in the datapoints.

ghost commented 7 years ago

@dlebauer - is this different from https://github.com/terraref/computing-pipeline/issues/115? How?

dlebauer commented 7 years ago

@rachelshekar #115 just covers the environmental logger that is on the lemnatec gantry / scanner this is for met data more generally. Goal is to get it into a consistent format in Clowder then create an extractor that converts Clowder datastream to netcdf

dlebauer commented 7 years ago

@robkooper I was mostly wanting the sql schema file to define the data model (since it is slightly different from the erd diagram above).

max-zilla commented 7 years ago

goal is to insert data into postGIS and convert to netCDF via extractor

Zodiase commented 7 years ago

@max-zilla Do you know what the extractor should subscribe to in order to monitor new met data files? I was thinking about subscribing to any new files in any dataset (with *.dataset.file.added), count the .dat files in that dataset and process them once I count 24. Do you have a better way of doing this?

Zodiase commented 7 years ago

@robkooper Could you explain a bit more about where (a specific dataset?) the extractor should get data from and where the output should go to?

dlebauer commented 7 years ago

@Zodiase for the weather station at UA-MAC outside the scanner, the extractor should get data from the 'weather' direcotory (terraref/sites/ua-mac/raw_data/weather/) and insert into the geostreams API.

Zodiase commented 7 years ago

@dlebauer Do you know how to use pyclowder to achieve that?

robkooper commented 7 years ago

You will need to register with clowder and say you are interested in a specific mimetype of files. The file will be downloaded and you are given a pointer to the file on disk. You can now work with that file and write the results in any location and notify clowder (using pyclowder) about this.

This might be a good point to add some functionality to pyclowder2 to deal with geosrteams and make it easier for you.

Zodiase commented 7 years ago

@robkooper I know about the overall process but I don't know how exactly shall I save the results back to clowder. How can I use pyclowder to "insert into the geostreams API" and is there a specific location the results should go?

I think what @dlebauer wants is to process those .dat files on a dataset basis (from what I have found, each dataset contains one day of data separated in 24 files). For that I'll just subscribe to dataset file added events unless anyone has a better way of doing it.

max-zilla commented 7 years ago

@Zodiase for the TERRA project our extractors have to be slightly more careful than others, because we want to write the output files to a specific location on Roger. However I don't think that matters here since we don't have output files, just insertion into geostreams database.

I know @caicai89- has been looking at geostreams API, and I need to update Clowder to support more complex geometries than points here - https://github.com/terraref/computing-pipeline/issues/157. I am not going to get to this until next week.

dlebauer commented 7 years ago

@max-zilla this issue does not require inserting polygons ... are the other geostreams api endpoints available (I don't see them here: https://terraref.ncsa.illinois.edu/clowder/assets/docs/api/index.html.

Zodiase commented 7 years ago

@dlebauer Could you help me understand the format of the .dat files?

First 7 lines of any .dat file:

"TOA5","WeatherStation","CR1000","39656","CR1000.Std.29","CPU:F13WeatherStation.CR1","39725","SecData"
"TIMESTAMP","RECORD","BattV","PTemp_C","AirTC","RH","Pyro","PAR_ref","WindDir","WS_ms","Rain_mm_Tot"
"TS","RN","Volts","Deg C","Deg C","%","W/m^2","umol/s/m^2","degrees","meters/second","mm"
"","","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Tot"
"2016-08-30 00:06:24",7276223,12.61,27.37,26.74,27.48,0,0,65,2.45,0
"2016-08-30 00:06:25",7276224,12.61,27.37,26.71,27.42,0,0,65,2.83,0
"2016-08-30 00:06:26",7276225,12.6,27.37,26.71,27.42,0,0,74,2.36,0
...

The data part looks like a typical CSV file and it looks like there are 11 columns. But what are the 4 lines above the data? Which one should I use as the column header? I tried to make sense of these 4 lines and it looks to me the second line should be the column header and the third line is unit? The first and the fourth make no sense to me.

max-zilla commented 7 years ago

@Zodiase I am not positive, but I think this might help: https://www.manualslib.com/manual/538296/Campbell-Cr9000.html?page=43

Doesn't really explain the first line - I think that's just some information on the sensor/weather station that collected the data. If you look on page 42 of that link (the one before) I think it describes these, Station Name, Logger Serial Number, etc.

The fourth line looks like a description of how data was collected:

dlebauer commented 7 years ago

That looks correct On Thu, Oct 6, 2016 at 3:54 PM Max Burnette notifications@github.com wrote:

@Zodiase https://github.com/Zodiase I am not positive, but I think this might help: https://www.manualslib.com/manual/538296/Campbell-Cr9000.html?page=43

Doesn't really explain the first line - I think that's just some information on the sensor/weather station that collected the data.

The fourth line looks like a description of how data was collected:

  • Smp = sampled
  • Tot = total
  • Avg (from the manual link) = average etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/terraref/computing-pipeline/issues/156#issuecomment-252085635, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcX5465UcokmXyCw59sAoIqygxD59goks5qxWAQgaJpZM4JtN81 .

Zodiase commented 7 years ago

@dlebauer So the extractor code I've worked on so far is able to be triggered and parse the raw input files without issues. Now the next step is to compose the JSON output you want. So what I understand is that such a data row "2016-08-30 00:06:24",7276223,12.61,27.37,26.74,27.48,0,0,65,2.45,0 should be converted into one single JSON document. And I know that last time we discussed that each column would go as one attribute in the properties field. But what about the rest of the JSON? The coordinates in the geometry for instance. Could you give a thorough example of the output JSON you want from some input such as the data row I mentioned above?

dlebauer commented 7 years ago
{
    "id": 12345,
    "created": "2016-08-30 00:06:24 -08:00Z",
    "start_time": "2016-08-30 00:06:24 -08:00Z",
    "end_time": "2016-08-30 00:06:24 -08:00Z",
    "properties": {
        "source": "http://terraref.ncsa.illinois.edu/clowder/datasets/xyz123abc456",
        "air_temperature, K": "285.12",
        "relative_humidity, %": "27.37",
        "surface_downwelling_shortwave_flux_in_air, W m-2":26.74,
        "surface_downwelling_photosynthetic_photon_flux_in_air, mol m-2 s-1":"0.02674",
        "wind_to_direction, degrees": 65,
        "wind_speed, m/s":2.45,
        "precipitation_rate, mm/s":0
    },
    "type": "Feature",
    "geometry": {
        "type": "Point",
        "coordinates": [-111.975071584772   
            33.074518691823,
            353.38
        ]
    },
    "stream_id": "123",
    "sensor_id": "123",
    "sensor_name": "UA-MAC F13 Weather Station"
},
dlebauer commented 7 years ago

@Andrade-Pedro could you please confirm

dlebauer commented 7 years ago

@robkooper could you please review the json sample I posted above - is this what we should post at each time point (1/s)?

Zodiase commented 7 years ago

@dlebauer So I see that you are mapping:

Now I assume source and sensor_name are constant (for this extractor).

But where did air_temperature, K, surface_downwelling_photosynthetic_photon_flux_in_air, mol m-2 s-1, coordinates, stream_id and sensor_id come from?

Andrade-Pedro commented 7 years ago

@dlebauer I edited your comment above to answer questions on weather table

dlebauer commented 7 years ago

@Zodiase my apologies if I didn't line them up right. Here is the correct mapping:

source destination
AirTC air_temperature
RH relative_humidity
Pyro surface_downwelling_shortwave_flux_in_air
PAR_ref surface_downwelling_photosynthetic_photon_flux_in_air
WindDir wind_to_direction
WS_ms wind_speed
Rain_mm_Tot precipitation_rate

Note that, for rain, I am translating Rain_mm_Tot to 'precipitation_rate', but this requires the assumption that the time step is 1s

dlebauer commented 7 years ago

@Zodiase FYI the geostreams API is not documented, but there is some useful information in #101 starting with https://github.com/terraref/computing-pipeline/issues/101#issuecomment-225702337

robkooper commented 7 years ago

@dlebauer looks good. created should be now(), start/end should be timestamp of when the sensor was read (make sure you are in the right timezone).

robkooper commented 7 years ago

the geostreams api is available on terra/clowder-dev for example here are the sensors

Zodiase commented 7 years ago

@robkooper When I visit http://localhost:9000/api/geostreams/sensors on my local testing environment, it says "Geostreaming not enabled". Is there a way to enable it?

My local testing environment is setup with (in Docker compose file):

clowder:
  image: ncsa/clowder:develop
  environment:
    RABBITMQ_URI: "amqp://guest:guest@rabbitmq:5672/%2f"
  ports:
    - "9000:9000"
  links:
    - mongo
    - rabbitmq
robkooper commented 7 years ago

you need postgresql as well as a database setup. that is not normally done during the docker-compose. You should be able to use terraref/clowder-dev for testing.

Zodiase commented 7 years ago

@robkooper So is there a way for me to quickly test my code that may or may not make irreversible changes to the testing environment?

Zodiase commented 7 years ago

@dlebauer Could you check if the following looks like what you need? Fields marked with "???" are still being worked on. The extractor is doing some processing on the properties to meet "CF conventions as used by PEcAn" and it's also converting wind_direction to eastward_wind and northward_wind.

From a sample source data:

"TOA5","WeatherStation","CR1000","39656","CR1000.Std.29","CPU:F13WeatherStation.CR1","39725","SecData"
"TIMESTAMP","RECORD","BattV","PTemp_C","AirTC","RH","Pyro","PAR_ref","WindDir","WS_ms","Rain_mm_Tot"
"TS","RN","Volts","Deg C","Deg C","%","W/m^2","umol/s/m^2","degrees","meters/second","mm"
"","","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Tot"
"2016-08-30 00:06:24",7276223,12.61,27.37,26.74,27.48,0,0,65,2.45,0
"2016-08-30 00:06:25",7276224,12.61,27.37,26.71,27.42,0,0,65,2.83,0
"2016-08-30 00:06:26",7276225,12.6,27.37,26.71,27.42,0,0,74,2.36,0

Generated (in order):

[
   {
      "stream_id":"???",
      "sensor_id":"???",
      "end_time":"2016-08-30T00:06:24",
      "created":"2016-10-13T16:55:03.295514",
      "geometry":{
         "type":"Point",
         "coordinates":[
            "???",
            "???",
            "???"
         ]
      },
      "start_time":"2016-08-30T00:06:24",
      "sensor_name":"???",
      "type":"Feature",
      "id":"???",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":2.45,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":1.0354147412647137,
         "relative_humidity":27.48,
         "air_temperature":299.89,
         "eastward_wind":2.2204540782397926,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "stream_id":"???",
      "sensor_id":"???",
      "end_time":"2016-08-30T00:06:25",
      "created":"2016-10-13T16:55:03.303521",
      "geometry":{
         "type":"Point",
         "coordinates":[
            "???",
            "???",
            "???"
         ]
      },
      "start_time":"2016-08-30T00:06:25",
      "sensor_name":"???",
      "type":"Feature",
      "id":"???",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":2.83,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":1.1960096807261795,
         "relative_humidity":27.42,
         "air_temperature":299.85999999999996,
         "eastward_wind":2.564851037313719,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "stream_id":"???",
      "sensor_id":"???",
      "end_time":"2016-08-30T00:06:26",
      "created":"2016-10-13T16:55:03.303591",
      "geometry":{
         "type":"Point",
         "coordinates":[
            "???",
            "???",
            "???"
         ]
      },
      "start_time":"2016-08-30T00:06:26",
      "sensor_name":"???",
      "type":"Feature",
      "id":"???",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":2.36,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":0.650504159728118,
         "relative_humidity":27.42,
         "air_temperature":299.85999999999996,
         "eastward_wind":2.2685776024144326,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   }
]

Updated to add start_time, end_time and created. Updated to use ISO 8601 for start_time, end_time and created to conform the example mentioned in https://github.com/terraref/computing-pipeline/issues/156#issuecomment-248988618.

Zodiase commented 7 years ago

Almost all done except getting 500 errors when inserting datapoints.

Zodiase commented 7 years ago

@max-zilla The basic version of this extractor is done and the code is at https://github.com/terraref/extractors-meterological/tree/localdev. However I'm still getting 502 Bad Gateway errors and not able to test it with clowder-dev instance so far. Could you check it out?

max-zilla commented 7 years ago

Closing this; #173 covers the other half of met extractor work.

dlebauer commented 7 years ago

@Zodiase before we close this ... is the extractor actually running?

We can downscale upscale the met data to 15 minutes

Zodiase commented 7 years ago

@dlebauer I can't really tell if the extractor is running or not. I don't know much (anything) about deployment. @max-zilla Is the extractor running?

max-zilla commented 7 years ago

@dlebauer @Zodiase the extractor has been completed and ready but it isn't deployed on production yet, @robkooper and I were waiting until after the event last week to install PostGIS plugin on production.

Postgres is on that machine now so I can get the geostreams stuff set up this week and deploy on production.

dlebauer commented 7 years ago

@Zodiase can we upscale these to 5 minutes before inserting into the database? Everything should be averaged except precipitation should be the sum.

Zodiase commented 7 years ago

@dlebauer OK I can make that change. How do you want to partition the data records into 5-minute chunks? For example, if the first record in the first file in the dataset is at 00:01:12, does the first 5-minute block start from 00:01:12 to 00:06:12, or 00:00:00 to 00:05:00 or 00:01:00 to 00:06:00? The latter two styles may fragment blocks. For example, the last few records in the last file of the last dataset may be also in the 00:00:00 to 00:05:00 range, depending on how the source data are partitioned into files and datasets, so there may end up with two 00:00:00 to 00:05:00 blocks with different values. The first style just looks very messy.

Zodiase commented 7 years ago

@dlebauer I just glanced at one of the dataset's files and apparently neither the files nor the dataset has a clean temporal cut-off. So the latter two styles would definitely create issues. Another strategy would be using 00:01:12 as the start time of the block, and keep adding data records until the next one falls out of 00:06:12, and use the time of the last added one as the end time of the block (so the end time isn't necessarily 00:06:12). That way at least we can nicely partition the data records in one dataset without worrying about the overlapping issue. But the problem with that is the results won't be all 5 minutes across each. (Some might be shorter than 5 minutes, but none will be longer)

Zodiase commented 7 years ago

I've implemented the aggregation logic and the code is currently in this branch: https://github.com/terraref/extractors-meterological/tree/5-min-aggregation

@max-zilla Could you test it? I only added some testing code in parser.py and played with some test data (also included in the branch) locally. I think it would be better to test from a deployed extractor.

To change aggregation options:

The test data I played with yields such result:

[
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:06:24-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:10:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.6207870370370374,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":0.07488770951583902,
         "relative_humidity":26.18560185185185,
         "air_temperature":300.17606481481516,
         "eastward_wind":1.571286062845733,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:10:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:15:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.4256666666666669,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.05141511827670856,
         "relative_humidity":24.226333333333386,
         "air_temperature":300.8981666666665,
         "eastward_wind":1.394382855930334,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:15:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:20:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.3858783783783772,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.09425296463470188,
         "relative_humidity":23.29226351351351,
         "air_temperature":301.213952702703,
         "eastward_wind":1.348590540556527,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:20:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:25:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":0.8310000000000005,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.35657497924484793,
         "relative_humidity":22.633933333333335,
         "air_temperature":301.50973333333326,
         "eastward_wind":0.7049300737104702,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:25:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:30:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":0.6694000000000001,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.585180649157013,
         "relative_humidity":25.478600000000007,
         "air_temperature":301.2232333333329,
         "eastward_wind":0.30741648387327564,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:30:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:35:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":0.6296666666666666,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.42173249926348644,
         "relative_humidity":26.469933333333355,
         "air_temperature":300.85969999999907,
         "eastward_wind":0.45458948531155813,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:35:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:40:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":0.8663333333333328,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-0.6006981174489593,
         "relative_humidity":24.133233333333333,
         "air_temperature":300.97440000000034,
         "eastward_wind":0.5790642074746596,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:40:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:45:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.1200666666666672,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.0444193473063164,
         "relative_humidity":21.460900000000024,
         "air_temperature":301.59006666666653,
         "eastward_wind":0.3707760504240207,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:45:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:50:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.3106333333333342,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.249505862534591,
         "relative_humidity":21.709133333333313,
         "air_temperature":301.60549999999927,
         "eastward_wind":0.38198168724184367,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:50:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:55:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.297633333333334,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.253133336504686,
         "relative_humidity":21.457600000000024,
         "air_temperature":301.69336666666703,
         "eastward_wind":0.324976158201803,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:55:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T01:00:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.3804999999999998,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.3556587631934873,
         "relative_humidity":21.25273333333331,
         "air_temperature":301.7047000000008,
         "eastward_wind":0.23843479144932786,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T01:00:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T01:05:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.5816666666666679,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.5763178363241495,
         "relative_humidity":22.110499999999984,
         "air_temperature":301.4501999999997,
         "eastward_wind":0.11952470035541446,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T01:05:00-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T01:08:23-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.682058823529412,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":-1.6719726912984594,
         "relative_humidity":23.09779411764704,
         "air_temperature":301.1047058823543,
         "eastward_wind":-0.14332925981518027,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   }
]

Notice all the data entries are in clean 5-minute chunks, except for the first one and the last one (since some data in other datasets may belong to the same 5-minute chunks.

max-zilla commented 7 years ago

I'm pulling this code today while updating the extractor and will test, @Zodiase

ghost commented 7 years ago

@max-zilla - please update

max-zilla commented 7 years ago

@Zodiase I integrated your code and deployed to the extractor VM, but the hardware failure this week delayed my ability to test. Assigning this to myself so I can close once I confirm everything's good but it's 99% ready.

max-zilla commented 7 years ago

Remember UIUC and Kansas, and Charlie may have code pull from netCDF already.

Zodiase commented 7 years ago

@max-zilla How are datasets from UIUC and Kansas different from MAC datasets? Would the current message subscription (*.dataset.files.added) work for them? Do they have different schemas? And why would the extractor need code to pull data from netCDF? Shouldn't it only pull from clowder datasets and put into geostream?