oceaneos / MapBox

Apache License 2.0
0 stars 0 forks source link

Compute statistics on raw data #19

Open roblabs opened 8 years ago

roblabs commented 8 years ago

Related to #10

Raw image data is stored in CSV and Floating point data. For the initial image data posted to Mapbox, we used 8 bit RGBA data.

The Floating Point data, where 99999 means no data, can be computed for several statistics

raw-csv-data

roblabs commented 8 years ago

@mriedijk

We now have a very efficient "Summation Statistic" that computes all values of Chlorophyll levels below a threshold.

The algorithm is very efficient, and takes less than 1 minute to run on all 170 months of NASA GeoTiffs. The algorithm is written in Numerical Python (NumPy).

The web page requires JSON data which it can easily parse, but the algorithm also formats for CSV.

{
  "data": [
    {
      "MY1DMM_CHLORA_2002-07.FLOAT.TIFF": 435435.51885391586
    },
    {
      "MY1DMM_CHLORA_2002-08.FLOAT.TIFF": 479583.77517676633
    }
  ]
}
MY1DMM_CHLORA_2002-07.FLOAT.TIFF,435435.518854
MY1DMM_CHLORA_2002-08.FLOAT.TIFF,479583.775177
MY1DMM_CHLORA_2002-09.FLOAT.TIFF,554653.510914
MY1DMM_CHLORA_2002-10.FLOAT.TIFF,510659.170558
MY1DMM_CHLORA_2002-11.FLOAT.TIFF,478017.987422
roblabs commented 8 years ago

I have verified that the Summation on Floating Point data is correct

I computed the thresholded summary on MY1DMM_CHLORA_2002-07 with

sum-over-float-vs-csv

roblabs commented 8 years ago

Count data of "no_data" values is now part of the calculation

Compare that these two computation are identical for computing the count of values.

gdal_compute_sum.py --csv -t 0.2 0.5 1.0 2.0 99999.0 -f MY1DMM_CHLORA_2002-07.FLOAT.TIFF

file,concentration count, No Data count,< 0.2,< 0.5,< 1.0,< 2.0,< 99999.0
MY1DMM_CHLORA_2002-07.FLOAT.TIFF,2434211,4045789,173463.460798,306416.076544,435435.518854,524163.342494,1002490.80658

gdalinfo -hist MY1DMM_CHLORA_2002-07.FLOAT.TIFF

Count of values with concentration = 2434211
Count of values with NO_DATA = 4045789

Compare on a larger sample of images

 gdal_compute_sum.py --csv -t 0.2 0.5 1.0 2.0 99999.0 -f 8-day/*2016-02*

file,concentration count, No Data count,< 0.2,< 0.5,< 1.0,< 2.0,< 99999.0
MY1DMW_CHLORA_2016-02-02.FLOAT.TIFF,1820750,4659250,118293.041138,247811.236787,322842.600912,376689.25626,544178.926567
MY1DMW_CHLORA_2016-02-10.FLOAT.TIFF,1774697,4705303,118285.107791,242023.553839,322016.628214,380873.52156,575592.267158
MY1DMW_CHLORA_2016-02-18.FLOAT.TIFF,1992878,4487122,132491.14908,272383.637656,363530.543782,426993.083193,625581.012911
MY1DMW_CHLORA_2016-02-26.FLOAT.TIFF,1981620,4498380,130821.659828,260154.830973,335131.095167,386283.786183,640911.533983

# count of values & no_data for several files to compare to the computed above
gdalinfo -hist MY1DMW_CHLORA_2016-02-02.FLOAT.TIFF
1820750 4659250

gdalinfo -hist MY1DMW_CHLORA_2016-02-02.FLOAT.TIFF
1774697 4705303

gdalinfo -hist MY1DMW_CHLORA_2016-02-18.FLOAT.TIFF
1992878 4487122

gdalinfo -hist MY1DMW_CHLORA_2016-02-26.FLOAT.TIFF
1981620 4498380