terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Changes to data stream from environmental sensors #26

Closed dlebauer closed 8 years ago

dlebauer commented 8 years ago

The first environmental data samples (e.g. 2016-02-15_21-20-08_enviromentlogger.json.txt) are in a json key:value format.

I propose the following changes:

  1. write one file per hour (rather than every 2 minutes)
  2. use variable names and units defined in #3 to avoid confusion
  3. write all variables except downwelling spectrum into a table (csv or netcdf) with a time stamp or dimension
    • add ambient CO2 from the moving sensor to this file.
    • co-locate CO2 sensor with other met sensors (or is there a reason to have it on the bay?)
  4. write a separate file to contain the downwelling spectral radiance, which should be a file similar to format in #14 (but lacking the x and y dimensions)
    • Note that currently the file has variables spectrum and wavelength but nothing measuring irradiance
  5. ensure that files are valid json (see below)
  6. Please restrict the text meta-data files to ASCII. Some '?'s appear that seem to have been μ "umol" or "micromol" would work.

@markus-radermacher-lemnatec

yanliu-chn commented 8 years ago

Is there a way to get the geo location of the weather station or sensor, and link station id with the observations?

dlebauer commented 8 years ago

@markus-radermacher-lemnatec

It appears that the json files are invalid -

[2016-04-11]$ jsonlint 2016-04-11_01-17-18_enviromentlogger.json 
[Error: Parse error on line 4151:
    "environment_
--------------------^
Expecting 'EOF', got ',']

One way to fix this:

  1. enclose contents with [] (make file begin with [, and end with ])
  2. separate elements with a ,: change }{ to },{
echo "[`cat 2016-04-11_01-17-18_enviromentlogger.json` ]" | sed 's/}{/},{/g' > test.json
jsonlint test.json
## OK

However, we should discuss the more general changes in the logger above before implementing this simple fix.

dlebauer commented 8 years ago

The CO2 sensor data have .bil extensions and each observation is written into a separate file; these observations should have timestamp + concentration and should be saved as a time series at daily or hourly time steps.

czender commented 8 years ago

irradiance/spectrum as a .bil file might be helpful for other purposes, but a 1-D json array would be easier for the hyperspectral workflow.

markus-radermacher-lemnatec commented 8 years ago

regarding 1) It's just a setting within the software that easily can be changed to 1h or 1d. Right now the time between two files should be long enough to not create to many files and short enough to ensure that in case of a problem not to many data points are lost. The environment logger software in the current state is very simple but it works well and i prefer to keep it like it is for now. I will set the the timespan of each file to 1h, ok?

markus-radermacher-lemnatec commented 8 years ago

regarding 2) Could you provide an example for your preferred environment logger output file. Then we change the output according to your template. Right now the output file format is just a first suggestion.

markus-radermacher-lemnatec commented 8 years ago

regarding 3) If you prefer to have the spectrum to be written into a separate file, please provide an example file.

CO2 sensor: right now it is located inside the camera box, thus by definition it is not an ambient sensor. Its distance to the canopy is round about 2m during measurements (except fluorescence: than it is closer to 1m). The other ambient sensor a mounted on top of the gantry, facing the open sky with a rough distance of 5m above even the grown up plants.

It is possible to get it's values from the both software parts, the environment logger and the moving sensor used in the gantry script. What do you need.

markus-radermacher-lemnatec commented 8 years ago

regarding 4)

Using the same extension bin for most of the sensors is just for consistency. The idea behind that is that the file extension should not be used directly, instead there is a description of the file in the meta data, "Output data format": "text/xml" in this case.

I guess you have to set up a data workflow for each sensor individually, so this should not be a problem. If this .bin extension is a problem let me know and we work out a solution for that.

markus-radermacher-lemnatec commented 8 years ago

regarding 5)

whats the difference to 3) Could you give a clear definition.

markus-radermacher-lemnatec commented 8 years ago

regarding 6)

could you send me the original file, because of the ongoing repair of the gantry the remote access is shut off. Most probably the software has been terminated during file writing, but I will check.

dlebauer commented 8 years ago

@markus-radermacher-lemnatec

czender commented 8 years ago

@markus-radermacher-lemnatec @dlebauer @FlyingWithJerome Jerome is working to parse the environmental sensor data for combination with the hyperspectral imager data. He finds the ES data is not properly formatted (below). Can Lemnatic please address this so we do not need to write workarounds? Thanks! Jerome says: Right now I'm working on parsing the environmental logger, and I noticed that there's one problem.

Each JSON file includes multiple JSON objects, but they are just simply added together instead of in a JSON array. The case is if I directly read it the Python will ignore everything but the last one, since it is illegal in RFC standard.

As a solution, I had a formatting function to re-format the JSON file into a JSON array. It works, but I really concern on runtime efficiency, since the re-formatting costs .3ms for each file according to the result on bash.

So could you ask the environmental logger personnel to export a JSON array for each file instead of multiple JSON objects? It would explicitly boost the runtime efficiency especially when we have lots of JSON files and each file has over 161,800 lines.

dlebauer commented 8 years ago

@czender is more than sed 's/}{/},{/g' required to correct the error?

@markus-radermacher-lemnatec what is the timeline for fixing this?

czender commented 8 years ago

@FlyingWithJerome will answer your question @dlebauer i'm just a messenger :)

FlyingWithJerome commented 8 years ago

@dlebauer @czender Sorry, I just noticed this discussion thread. I reformatted them like this: [ {"environment_sensor_set_reading": {...}} , { "environment_sensor_set_reading": {...}}, {"environment_sensor_set_reading" : {...}} , ...... ] so yes, as you mentioned above, adding square brackets and commas would fix this problem.

TinoDornbusch commented 8 years ago

@dlebauer , I cannot do anything about it. I need to push Markus.

markus-radermacher-lemnatec commented 8 years ago

@dlebauer the output format for the environment logger will be changed as suggested to a json list, the fix will be available from the 3th of May on.

Ndrey commented 8 years ago

@dlebauer Hi this is André from LemnaTec, since Markus is ill this week I will jump in to fulfill our promise. Did I understand it correct that in first place you guys are happy if the environment logging json files are valid and as json Arrays?

dlebauer commented 8 years ago

@Ndrey yes, it is okay if the environment logging is provided as valid json files. It isn't clear what you mean by as arrays (could you paste an example?) but even key-value pairs are okay.

That is okay for format, but for content and frequency please see additional comments above.

Ndrey commented 8 years ago

@dlebauer With json array I only meant to put the elements in square brackets and seperate them with commas as suggested.

Markus already changed it to a lower frequency, so each hour a new file is created instead of each 2 minutes, but the file size is then around 170 MB each...

czender commented 8 years ago

@Ndrey @dlebauer to be a bit more general, please ensure that all sensor files indicated (by .json suffix) as storing JSON are actually valid JSON. Once we convert them to netCDF, their size will be significantly reduced.

markus-radermacher-lemnatec commented 8 years ago

See the updated output of the environment logger to a more structured format now. Could you confirm, that the layout of the new format fits your needs and meets the json standard. After your confirmation I will update the software on the gantry system.

dlebauer commented 8 years ago
  1. Where is spectral flux
  2. Does it pass the jsonlint validator? On Wed, May 4, 2016 at 8:42 AM markus-radermacher-lemnatec < notifications@github.com> wrote:

See below the updated output of the environment logger to a more structured format now. Could you confirm, that the layout of the new format fits your needs and meets the json standard. After your confirmation I will update the software on the gantry system.

{ "environment_sensor_fixed_infos": [ { "par_sensor": { "fixed_info_0": "...", "fixed_info_1": "..." }, "weather_station": { "fixed_info_0": "...", "fixed_info_1": "..." }, "spectrometer": { "fixed_info_0": "...", "fixed_info_1": "..." } } ], "environment_sensor_readings": [ { "timestamp": "2016.05.04-15:00:04", "weather_station": { "sunDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "airPressure": { "value": "error", "unit": "error", "rawValue": "error" }, "brightness": { "value": "error", "unit": "error", "rawValue": "error" }, "relHumidity": { "value": "error", "unit": "error", "rawValue": "error" }, "temperature": { "value": "error", "unit": "error", "rawValue": "error" }, "windDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "precipitation": { "value": "error", "unit": "error", "rawValue": "error" }, "windVelocity": { "value": "error", "unit": "error", "rawValue": "error" } }, "sensor par": { "value": "error", "unit": "error", "rawValue": "error" }, "spectrometer": { "maxFixedIntensity": "123", "integration time in µs": "123", "wavelength": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ], "spectrum": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ] } }, { "timestamp": "2016.05.04-15:00:05", "weather_station": { "sunDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "airPressure": { "value": "error", "unit": "error", "rawValue": "error" }, "brightness": { "value": "error", "unit": "error", "rawValue": "error" }, "relHumidity": { "value": "error", "unit": "error", "rawValue": "error" }, "temperature": { "value": "error", "unit": "error", "rawValue": "error" }, "windDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "precipitation": { "value": "error", "unit": "error", "rawValue": "error" }, "windVelocity": { "value": "error", "unit": "error", "rawValue": "error" } }, "sensor par": { "value": "error", "unit": "error", "rawValue": "error" }, "spectrometer": { "maxFixedIntensity": "123", "integration time in µs": "123", "wavelength": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ], "spectrum": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ] } }, { "timestamp": "2016.05.04-15:00:05", "weather_station": { "sunDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "airPressure": { "value": "error", "unit": "error", "rawValue": "error" }, "brightness": { "value": "error", "unit": "error", "rawValue": "error" }, "relHumidity": { "value": "error", "unit": "error", "rawValue": "error" }, "temperature": { "value": "error", "unit": "error", "rawValue": "error" }, "windDirection": { "value": "error", "unit": "error", "rawValue": "error" }, "precipitation": { "value": "error", "unit": "error", "rawValue": "error" }, "windVelocity": { "value": "error", "unit": "error", "rawValue": "error" } }, "sensor par": { "value": "error", "unit": "error", "rawValue": "error" }, "spectrometer": { "maxFixedIntensity": "123", "integration time in µs": "123", "wavelength": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ], "spectrum": [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ] } } ] }

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/terraref/reference-data/issues/26#issuecomment-216868530

czender commented 8 years ago

please at the same time fix the spelling of the filenames from "...enviromentlogger.json" to "...environmentlogger.json".

TinoDornbusch commented 8 years ago

The output of the spectrometer is 'raw' counts.

You need to use the attached calibration files to convert it to units of µW m-2 s-1. Careful you need to take the bandwidth of the chip into account (0.4nm) if you want to convert to µmol m-2 s-1.

I added the calibration files to the gantry ftp:

/gantry_data/LemnaTec/EnvironmentLogger/CalibrationData

Ndrey commented 8 years ago

@dlebauer The file Markus pasted here passed the json validator from Newtonsoft as well as the http://jsonlint.com/ validator

max-zilla commented 8 years ago

@dlebauer @czender Here are the two Calibration files that Tino mentioned: Calibrations.zip

max-zilla commented 8 years ago

..and here is the latest EnvironmentalLogger json file, from this morning (5/5): 2016-05-05_07-20-52_enviromentlogger.json.zip

e: @czender doesn't look like the "enviROMent" typo was fixed yet, FYI.

dlebauer commented 8 years ago

@markus-radermacher-lemnatec @TinoDornbusch could you make sure to correct the spelling of enviROMent in the environmentlogger file?

TinoDornbusch commented 8 years ago

@dlebauer. I have no access to the source code to make these changes. I try to bother Markus on holiday.

dlebauer commented 8 years ago

@markus-radermacher-lemnatec @TinoDornbusch What is the status of the CO2 sensor - are you going to fix it on the gantry and combine the data stream with the environmental logger?

TinoDornbusch commented 8 years ago

Could someone help out with a batch rename script for ftp to rename the environment jsons?

dlebauer commented 8 years ago

@TinoDornbusch I've made a separate issue (#29) for the fixing and renaming of the logger files.

czender commented 8 years ago

@FlyingWithJerome Please alter EnvironmentalLoggerAnalyser.py to work with 2016-05-05_07-20-52_enviromentlogger.json as new default filetype. it should not have any JSON issues, and so should be opened read-only (not r+, which causes failure on Roger computer since we don't have write permission on input directory). Print a warning and exit if there is a JSON issue. Assume from now on that all logger files have valid JSON, meaning that old logger files will have been previously run through the batch script that @dlebauer mentions above to fix the JSON (maybe you can help him with that?)

FlyingWithJerome commented 8 years ago

@czender I just tested it and EnvironmentalLoggerAnalyser.py can deal with it without any changing. The total running time is 26.894s and the output netCDF is 7.7MB. However, we should not remove the reformatting function or changing the reading mode. 2016-05-05_07-20-52_enviromentlogger.json is not a valid JSON. We still need to reformat the "environmental_sensor_set_reading" to an array. If we do not reformat it, we will lose 1757 out of 1758 readings but the last one.

dlebauer commented 8 years ago

@FlyingWithJerome we are going to fix all of the invalid files (#29) so you should not have to deal with them ...

FlyingWithJerome commented 8 years ago

@dlebauer I'm sorry and I just noticed that, thank you!

dlebauer commented 8 years ago

@FlyingWithJerome all of the environmental data through 2015-04-13 have been corrected. I've checked that these files are valid json. Could you please check that your script works with these?

TinoDornbusch commented 8 years ago

Spelling error is fixed from 6.5.2016 on.

FlyingWithJerome commented 8 years ago

@dlebauer Yes, I rerun the script after removing the reformatting function. It works well, thank you! There's a wrinkle for me and seems it is OS X only. "sed" command needs one more option, so I added an empty string to solve it: sed -i '' 's/}{/},{/g' $file

czender commented 8 years ago

Either the logger data on Roger have not been updated, or there is a problem with the updated files, or there is a problem with EnvironmentalLoggerAnalyser.py. Same problem with 2016-04-07 files:

ender@cg-gpu01:~$ python ${HOME}/terraref/computing-pipeline/scripts/hyperspectral/EnvironmentalLoggerAnalyser.py /projects/arpae/terraref/raw_data/ua-mac/EnvironmentLogger/2016-05-02/2016-05-02_12-10-52_enviromentlogger.json ~/rgr
Processing /projects/arpae/terraref/raw_data/ua-mac/EnvironmentLogger/2016-05-02/2016-05-02_12-10-52_enviromentlogger.json....
Traceback (most recent call last):
  File "/home/zender/terraref/computing-pipeline/scripts/hyperspectral/EnvironmentalLoggerAnalyser.py", line 212, in <module>
    fileInputLocation)
  File "/home/zender/terraref/computing-pipeline/scripts/hyperspectral/EnvironmentalLoggerAnalyser.py", line 101, in JSONHandler
    return json.loads(fileHandler.read()), wavelength, spectrum
  File "/sw/python-2.7.10/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/sw/python-2.7.10/lib/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 4151 column 2 - line 614201 column 2 (char 101524 - 15025186)
dlebauer commented 8 years ago

@czender per https://github.com/terraref/reference-data/issues/26#issuecomment-217488670 above "all of the environmental data through 2015-04-13 have been corrected."

From the command you executed, it looks like you were trying to process data from 2016-05-02. These data are still transferring, and I will fix them after we get the last of the old malformed enviromentlogger files -

TinoDornbusch commented 8 years ago

@dlebauer ... The CO2 sensor moves and is hence a moving sensor. Relocating it to the top of the gantry and change the datastream require time and work ressources. I cannot do that.

I find measuring the CO2 concentration close to the canopy a valuable measurement

TinoDornbusch commented 8 years ago

@dlebauer Of course if you want that we implement this, but it will take some time.

Moreover we are in measurement campaign and gantry downtimes should be minimal.

My suggestion would be to do that after the experiment along with other upgrades.

dlebauer commented 8 years ago

I find measuring the CO2 concentration close to the canopy a valuable measurement

That is what I am confused about - it is not clear what information this will provide? I don't think it will be possible to resolve plot-scale (~2x4 m plots) effects on atmospheric [CO2] at > 2m above the canopy. LiCOR provides an excellent book explaining the theory of eddy covariance, but we are not set up to use that technique. I am afraid that the confounding effects of moving the sensor around will make it more difficult to estimate the ambient [CO2] above the canopy layer that would otherwise be useful as a boundary condition / driver for crop modeling in the same way light, rain, temperature etc can be used.

It isn't the most essential sensor so if it is too much trouble to move this year that is okay. But it would be nice if the files were written out as a time series and saved hourly with time, concentration, and position in x,y,z space rather than a new folder + metadata every 5 seconds.

dlebauer commented 8 years ago

@TinoDornbusch

the environmental sensor data still writes out "?" which I suspect should be "micro"?

      "sensor par": {
        "value": "258.7112684654",
        "unit": "?mol/(m^2*s)",
        "rawValue": "5.65840556708583"
      },
      "spectrometer": {
        "maxFixedIntensity": "16383",
        "integration time in ?s": "5000",
TinoDornbusch commented 8 years ago

@dlebauer ...yes it is µ... I will have our IT guys fix this.

TinoDornbusch commented 8 years ago

you should get units of umol and us

TinoDornbusch commented 8 years ago

@dlebauer CO2 sensor is now in the environmentlogger.json. I still have it implemented in the moving sensor data acquisition. Will remove if you do not wish to have positional information.

Sensor will be moved on top of the gantry during winter upgrades.

{ "environment_sensor_fixed_infos": { "par_sensor": { "manufacturer": "www.apogeeinstruments.com", "model": "SQ214", "location in gantry system": "on top" }, "co2_sensor": { "sensor manufacturer": "Vaisala", "model": "Carbocap CO2 Probe GMP343 A1C1B0N0N0B", "sensor serial number": "L3420008", "additional info": "SO 5530060878", "calbration date": "2015.08.18", "location in gantry system": "camera box", "analog digital interface": "WAGO 750-478"