Open dpsnowden opened 8 years ago
Calculate this statistics on long time series available in the DATA_GRIDDED directory and put the result on the web site.
As of July 10, 2018, there are 5 folders in DATA_GRIDDED: KEO, PAPA, PIRATA, RAMA, and TRITON, with a files per folder breakdown of: 9 11 115 126 19
Adding these up, there are 280 files according to the index file. The mode breakdown is: 181 delayed mode, 99 realtime
The breakdown of standard names is shown below. This is what's in the files, and I haven't checked if each and every name is CF compliant. Note that standard names can recur in 1 file (e.g. multiple depth axes). 'height' 922 'depth' 563 'latitude' 351 'longitude' 351 'time' 280 'sea_water_temperature' 153 'surface_downwelling_shortwave_flux_in_air' 150 'air_temperature' 146 'relative_humidity' 146 'wind_speed' 146 'eastward_wind' 145 'northward_wind' 145 'wind_to_direction' 145 'rainfall_rate' 141 'sea_water_salinity' 139 'sea_water_sigma_theta' 138 'eastward_sea_water_velocity' 127 'northward_sea_water_velocity' 127 'direction_of_sea_water_velocity' 102 'sea_water_speed' 102 'air_pressure_at_sea_level' 81 'surface_downwelling_shortwave_flux_in_air_standard_deviation' 53 'surface_downwelling_longwave_flux_in_air' 42 'sea_water_electrical_conductivity' 19 'sea_water_sigma_t' 19 'upward_sea_water_velocity' 19 'sea_water_pressure' 3
Please let me know if additional, specific stats are desired, but this gives us a good idea of what's being measured. Next, I'll try to rework my script to give us an idea of the time spans of the files.
Here's a quick summary of DATA_GRIDDED. Others may want to check the work (zipped .m file). This is based off the index file from NDBC's ftp site (attached), shortened to include only DATA_GRIDDED files. It does not guarantee data are present (variables could be NaN-filled). The distribution appears trimodal, with concentrations of time-series at ~1, ~11, and ~18 yrs. This is likely due to the operational lengths of the programs contained in DATA_GRIDDED.
Someone may want to write a script that actually opens each file and assesses the percentage of data present in each file. However, it's difficult to characterize gaps (instrument failure? depth mismatch between deployments requiring NaN-filling of a larger array? other?).
Basic Statistics: Min file length = 0.02 yr Max file length = 24.47 yr Mean file length = 8.53 yr Median file length = 8.64 yr
index.txt - Truncated Index File
DATA.m.zip - Matlab Script
Regards, Nathan
Why don't we calculate the number of years of data at all the sites, not just the ones that are currently presented as long time series files in the 'gridded' or 'product' directories? We have uploaded met data for Stratus and NTAS starting in 2000 and 2001, respectively. The time-merged versions of these will be submitted ... very soon, but the data is already there, in the 'data' directory.
Here's a short summary of DATA. Again, interpret cautiously, and provide feedback. For example, a large number of 8 hr files biases statistics. A few index file entries were corrupt, or not parsed correctly due to an incorrect number of entries (~1%).
DATA_writeup.docx - Variable Breakdown
oceansites_index.txt - DATA Index File
DATA.m.zip - Matlab Script
I'm curious about what techniques are used to combine multiple deployments into a single long time series file. In the past I've used Ferret because of its re-gridding functions. What is your tool of choice?
We do this 'manually' in Matlab.
As part of the process, we turn some of the fields, like sensor heights, water depth, range ring size, serial numbers, surface current velocity depth and instrument model, deployment and recovery cruises, etc etc, into arrays. I'm sure Ferret could do that too, but I'm not sure if our files are Ferret-friendly.
I don't do the actual merging, but I'm cc'ing the person who does, Kelan Huang, in case there's more to the process than just concatenating the data arrays.
Since we apply a single magnetic correction to each deployment, based on the center point of the deployment year, I'm thinking we have jumps in the values when there's a redeployment. I don't think we address that, but ... maybe we do.
Regards - Nan
On 10/4/18 2:18 PM, Mike McCann wrote:
I'm curious about what techniques are used to combine multiple deployments into a single long time series file. In the past I've used Ferret because of its re-gridding functions. What is your tool of choice?
--
Thanks for sharing @ngalbraith !
This question seems to also apply to https://github.com/oceansites/dmt/issues/28 and https://github.com/oceansites/dmt/issues/46.
Long time series are one of the primary goals fo OceanSITES. Can we calculate the number of products in the PRODUCT( or whatever replaces DATA_GRIDDED) directory that have length greater than 2, 5, 10, 20 years?
@MBARIMike this is the application I was thinking of for the thredds_crawler script.