Explore and visualize US and Canadian buoy datasets

DaniJonesOcean commented 5 months ago

Issue Purpose: The purpose of this task is for you to become acclimatized with the buoy datasets curated by the US National Buoy Data Center (NDBC) and the Canadian Maritime Data. You will create visualizations that show the time series of water temperatures and the geographic locations of the buoys.

Background: Our project requires an understanding of the water temperature dynamics and spatial distribution of observational data across the Great Lakes. The buoy datasets offer valuable information on both, so we need to know how to navigate and extract useful insights from them.

Tasks:

Dataset Familiarization:
- Access and open several CSV files from both datasets at SST-sensor-placement-input/NDBC_CSV for the NDBC and SST-sensor-placement-input/CMD_CSV for the Canadian dataset.
- Familiarize yourself with the structure and headers of the datasets, using the parameters description provided by NDBC when needed: NDBC Measurement Description.
Data Visualization — Time Series:
- Choose one buoy's data and create a time series plot of the water temperature (WTMP), noting any gaps within the record. Ensure the time is properly represented on the horizontal axis.
Data Visualization — Geographic Plot:
- Utilize the latitude and longitude information from the chosen buoy's data to plot its location on a map of the Great Lakes.
Data Interpretation and Report:
- Document any peculiarities, gaps, or patterns noticed in the water temperature time series.
- Reflect on the dataset's usability as a context set for DeepSensor training tasks.

Deliverable:

A very brief report detailing your exploration process and findings. A few comments and plots attached to this GitHub issue will be fine.
Include the water temperature time series plot and the geographic location plot of the buoys.

Expectations: The completion of this task should give you a solid understanding of the buoy datasets, their data structure, and how to visualize the contained information. Your report and visualizations will help us gauge the datasets’ potential utility as a context set for DeepSensor (e.g. using the buoy locations as a density channel).

eredding02 commented 5 months ago

@DaniJonesOcean I first looked at NDBC buoy 45001, its location is denoted with a pink plus sign. This buoy has observations from 1981 to 2020. The observations in this buoy for water temperature include many Nan values, which could mean a multitude of things: data didn't pass a quality test, large jumps in temperature, suspicion in data integrity, or averages of larger increments reported rather than individual, smaller increments. In addition to Nan data, there is also a gap in observations when buoys are taken out in the winter time. I have included a complete time series of SST from this buoy with red dots signifying Nans. A shorter time series I have included shows differences in Nan gaps(empty space) and gaps in observations(points are connected with a straight line).

There was a variety in quality seen in the different buoys. It seems that NDBC buoys(as opposed to Coast Guard or other organizations/institutions) had the most consistent data. Many buoys had abnormally large gaps or Nan data, and some did not record SST at all. The Canadian buoys, operated by Environment and Climate Change Canada, seemed to have clean, consistent data, having close to no Nan values.

I have a lot of questions regarding how we will give DeepSensor buoy data and how DeepSensor will handle it. I believe DeepSensor requires observations at a consistent interval(daily/hourly/etc.), but do they have to be from the same interval range? Many buoys have pre 1990 data while some do not have observations until post 2000. Tom Andersson used sensor locations as a density channel, but there is much variability in buoys in the Great Lakes. My instinct is to solely select buoys with consistent SST observations and use those as the density channel.

download Screenshot 2024-06-17 at 9 47 24 AM

DaniJonesOcean commented 5 months ago

@eredding02 Amazing, thanks for your work here.

I like your thinking about using stations with consistent SST for density channel. That's definitely one good way to go.

Another option might be to pick a particular year and frame the question this way: "given that the observing system consists of N buoys at locations {x_i} in 2005, what would have been the best place to situate new buoys for 2006?" (Here I'm assuming that the objective is to represent daily SST variability.) That's a kind of retrospective approach to network design that could be informative. Let's talk more when we meet this afternoon.

Great stuff - feel free to close this issue.

great-lakes-ai-lab / GreatLakes-TempSensors

Explore and visualize US and Canadian buoy datasets #17