robbinscalebj / NeonPredictability

1 stars 3 forks source link

Running Notes from Zoom Calls #10

Open jpeters7 opened 4 months ago

jpeters7 commented 4 months ago

April 15, 2024 Call

The following are notes from a call with @robbinscalebj, @abbylewis, @colebrookson, @OlssonF, and myself on April 15, 2024.

Vision - how does forecasting across variable fit into the literature. What is exciting about ML ensembles (this is based on all the forecasts that are labeled tg in the forecast challenge)? Testing expectations for near term forecasting - how it changes across horizons. How does it vary across scales?

Think we are challenging that the scale curves are negative exponential functions

Method for comparing forecasts across

Model agnostic benchmark of forecast skill across variables.

Will forecastability follow and exponential decay function? This is an interesting question. This is assuming a model that incorporates initial conditions. The further you get away from initial conditions the more uncertainty compends. Abby - worried about overly pushing this point. Wonder if the ML could be improved by including an initial condition. But think this point is valid.

Comparison across variables is what Abby is really interested in. What makes one variable more forecastable than another? Think this will also be examined in Cole’s work.

From existing forecast papers - there have been analyses within themes/variables. Ward et al - Forecast complexity analysis is all population dynamics Wheeler - Phenology Challenge Olsson - Aquatics Challenge

Compare across 3 variables that have high frequency data coming in - phenology, aquatics, terrestrial

Cole - forecastability across variables particularly for variables that are remotely sensed or automatically collected. High

Cole and Shubhi have been talking about what can we do with model generated data for population data?

Freya - using beetles and ticks could confound analysis. You could lose ecosystem level questions because you are more constrained by the sampling methods for beetles and ticks.

Looking at rate of decay across ecosystems. If there is a rate of decay is it more sensitive to the weather forecasts?

Thinking about chlorophyll - there is a huge amount of difference in data availability because they pull out the sensors in the winter.

Can you standardize across phenology and terrestrial across season?

What are the last 3 sentences of the conclusion? What are the main figures?

Comparison of forecasts within sites

Disentangling seasonality without digging in on smaller questions.

Clarify the clean steps to take and not get distracted from the wormhole questions.

Key questions Caleb is thinking about We hypothesized that forecast skill 1) declines across forecast horizons for all variables, and 2) the rate of decline in forecast skill differs among variables due to differences in sensitivity to initial conditions, among other factors.

Think these two questions are enough to create a paper around.

Kathryn’s paper dug into the site differences - why do the sites differ? Different greenup times affect forecasts

Freya’s aquatics paper - the different models were more interesting. Couldn’t do the site level analysis.

Caleb can dig into the variable question. The models are robust, they do the same thing. So digging into the variable question can be done is most exciting.

Spatial analyses would also potentially be cool, but as a secondary point. But see how the variable analysis take place.

The cool thing about having so many sites is that it gives more replication to do the variable analysis.

Could leverage NEON design and do subset of where you are clustering co-located sites. This is something to look into. Where are the site co-located?

Cole and Shubhi have been thinking about: If you use a reanalysis product for climatic data, the grid cells encompass more than 1 NEON site. Could colocate sites in a grid model and use a met model as a null model for the grid cell.

If doing colocation to get a higher sample size of co-located sites, could zoom out and draw a grid. E.g., EMA5 reanalysis product - use those lat, long grids to treat each grid as having the same climatic conditions. Those grid cells are 0.25 Cole has a figure where he overlaid the grid - he will look for it.

Staying focused on the variable question - lets you control for site across variables. Could control for phenology and aquatics at Mountain Lake. This could be a wormhole. What percentage of sites do this?

Is there a simple/not wormhole thing - at locations of colocation, is the curve of decline similar compared to places without colocation?

Within variables vs across sites?

Does it differ more across sites across variables or across sites? If the answer is that they decline more similarly across sites than across variable, that would be really interesting. If they don’t it makes the problem of doing a really good forecast more complicated. Translating forecasts between variables is just as hard even if you don’t have to make a spatial leap.

NEON Sites: table 1 here

How transferable is our understanding of forecasts across variables and sites?

Keep the paper thinking about the breadth that aligns with the Theory WG interests. This is useful for people who are experts in phenology or aquatics.

What are the take homes for transferability of forecasts across systems? Basic questions about forecastability and predictability.

Journal ideas: Ecology letters, frontiers in Ecology and the Environment, ecological applications? Eco App is where much ecoforecasting is located, but would be nice to make it appealable to more basic ecology publications.

Aspect of framing and discussion - this can lay out a vision for what is next and what other people can do.

Is it too late to submit new models for this analysis?
Any class of empirical models can be thrown in. But will need to backfill.

There are data gaps If you have skill curve from GAM and it is based on 3 observations then probably don’t want that, but if you have other

Fit GAM curve and then decompose the curve across early/middle/late horizons. If you are missing observations at a particular chunk than it may or may not affect the curve.

Can we let the seasonal aspect go so there are more forecasts for a given forecast, site, horizon combination?

Every forecast has a fitted GAM - could you do the GAMs across all the forecasts at that site? Every site has its own average forecast. But then can’t look at if the slope differs over time. But can look if it differs over variables and sites. Could potentially help average over missing data. Could have a hierarchical GAM - originally was too big. Caleb went the alternative and made tons of tiny GAMs

Caleb will do cleaning Input on GitHub issues will be useful.

Abby will look at GitHub issues, can maybe submit more models.

robbinscalebj commented 4 months ago

@jpeters7 you took notes!? You rock!

jpeters7 commented 3 months ago

May 7, 2024 Call

The following are notes from a call with @robbinscalebj, @abbylewis, @rqthomas, and myself on May 7, 2024.

Challenge CI is in transition Chlorophyll a forecasts issues: We have old forecasts in one bucket and new forecasts are in a new bucket. Think this issue has been fixed.
Anything submitted since January hasn't been scored. Need to decide how far back to score. This will take quite a bit to do all the scoring.

Will take the automatic scoring going for the last few months. Then have dedicated time of shut down to do a 1-off to do the past scoring. Coordinate with @rquthomas when we plan to do this.

Took the chla out of Freya's paper because it was wonky.

Are the targets okay now for the chla? Freya sent updated plots of chlorophyll on April 25 and now they are good to go.

Do we want to re-run the forecasts? One argument - this is a real time forecast challenge and we acknowledge that issues will happen. So do we want to leave it in and acknowledge. Another argument - results from today are influenced by bad data and can have reviewers ask why it wasn't fixed. Could have a paper on data issues - trials and tribulations of running a real time forecasting challenge. But this doesn't fit with the current paper.

Design discussion of an algorithm - do you want some forgetfulness of your algorithm so historically bad data doesn't permate through. This isn't theory group questions, but something to be aware of.

Before we resubmit chla forecasts - Quinn can brainstorm with Carl about the potential to restore offline as an isolated job. Hard part is retraining a local machine task.

Recapping questions/hypotheses - submitting 18 forecasts across all sites.

Hypotheses: There will be declines across forecast horizons for all variables and rate of decline differs between variables.

What is the temporal and spatial variability (skill) and how does it vary between variables?

What scales is ecology more or less predictable - this is why creating and analyzing near term forecast are important. What scales do we care about. Need to know when we are good or bad at making predictions. This analysis is part of the overall puzzle.

Caleb has tidied up the scripting workflow so this will be helpful for future analyses Calculated skill scores Went back to the big GAMs.
It is modeling a non-linear trend for a variable including site level deviations trend for forecast week includes a tensor interaction includes a factor smooth interaction - so site level deviation Is there a modelID effect? The data is an ensemble. The CRPS for the model is calculated as the mane prediction and error for all models for any given date and subtracted from climatology CRPS

Is there any filter for absurd forecasts? This was something Freya had to think about for the aquatics paper. The multi-model mean did the best if you aggregated all the way up with the nulls. But it didn't apply to the NEON Challenge forecats there were some really bad models.

There are no downsides to a "bad" forecast in the Challenge. So anyone can submit any forecasts to see what sticks.

With the "tg" models we are applying them across all the variables/sites and we aren't necessarily including forecasts submitted by other teams (e.g., forecasts submitted by a class or workshop as people are learning)

Doing a comparison of models: could do an ensemble with models with initial conditions vs those that don't

Forecast week site id is included in the GAM to allow it to vary by site. Could change it to forecast by month by or forecast by season - this will reduce the complexity. Or could do forecast by latitude to capture the differences in seasonality across latitude.

Handling temporal variation has been tough to nail down. Want to reduce complexity. The aquatics also gets weird because there is no winter at Toolik - there is only summer and it is short.

Looked at plots which was fun. When looking across sites it was hard to see what sites were what colors, but could use plotly to be able to hover over the line to see what site it is. For example, plotly::ggplotly(oxy_hoorizonxsite.p)

Abby - will continue to work through the Issues.

Big thing to work on now - getting the models retrained and resubmitted for chla

Abby will work on the issue of having >2000 forecasts being submitted each day.
Could use the check to see if the forecast is submitted to check that it has been submitted for the past month

This is the dashboard for all the forecast now: https://projects.ecoforecast.org/neon4cast-ci/performance.html

Check scores: by going to the catalog here: https://projects.ecoforecast.org/neon4cast-ci/catalog.html Then go to models - select the model id.

The catalog has everything - forecast summaries are where all forecasts are collapsed to one line (no ensemble, no parameters) If you want to plot forecasts - go to forecast summaries

Won't have problem to analyze forecasts on the back end. Most important thing is to make sure the forecasts are correct and what we want them to be.