mlap / neon4cast-aquatics

The aquatics group repo for ESPM 288
0 stars 1 forks source link

Modeling #8

Open mlap opened 3 years ago

mlap commented 3 years ago

The general classes of models that I imagine we will investigate over the term include the following, I'll provide a brief running example that hopefully conveys the gist of the different methods:

Mechanistic Models -- This is basically direct physical modeling of the system. For example, let's say I am making a forecasting model for the pressure of a gas in a box. A mechanistic model, which uses the ideal gas assumption, would be P = NRT / V. So if I am given the temperature forecast for the box, provided moles and volume are assumed to be constant, then I could use this model to predict what the pressure would be over time.

Bayesian Models -- The general idea here is that you are going to use probability distributions to model the data. So in the gas example above, let's say we think that pressure for the box is drawn from a normal/gaussian distribution that has a mean of NRT / V with some standard deviation. The task for bayesian modeling would be to first estimate what the parameters of the model are from the past data; e.g. what is the std. Once we have estimated this parameter, and provided we are given a forecast for temperature, we can then make our forecast by sampling from the distribution to get estimates on what the pressure will be.

Machine Learning (ML) Models -- ML is a huge field with tons of tons of different algorithms that can be starkly different. But the general main idea, not always true though!, is that ML models typically have a bunch of parameters. During the training phase for this above example, you would feed in past data of moles, temperature and volume into the ML model where the ML model outputs one real-valued number which we want to fit to pressure. Through an algorithm called back propagation, the ML model will update the parameters in its model so that its output will better approximate the pressure of the gas in the box. Once our model works well on past data, then we can make forecasts with the ML by say using a temperature forecast and plugging in the best estimate for moles and volume as inputs, the ML model will then directly output the prediction for pressure.

There is a lot more to the Bayesian and ML approach but hopefully this gives everyone a general roadmap of what is what to guide people in independent exploration. The progression of mechanistic to bayesian to ML I think is natural because often in Bayesian methods you need some understanding of the mechanisms to write down the model. ML meanwhile you don't need any mechanistic understanding to do ok, but having some mechanistic understanding can potentially help you a lot. Like in our aquatics examples, let's say from mechanistic modeling, we find that phytoplankton population is really important to DO but none of the drivers really capture it, if we can define some proxy variable to describe phytoplankton population then maybe inputting this proxy variable into our ML model would lead to much better predictions in a shorter amount of time. I'll scavenge around for some links on these various topics and add them in a bit.

mlap commented 3 years ago

But there are way more methods out there! One reference we can use is the fable package which has pre-packed forecasting models. For reference it can do everything listed here: http://fable.tidyverts.org/reference/index.html. And a textbook that goes over what most of the models in Fable are doing: https://otexts.com/fpp3/.

mlap commented 3 years ago

I just came across this e-book, https://bookdown.org/marklhc/notes_bookdown/introduction.html, that seems to be a good general bayesian resource -- I have not used it before but skimming over the table of contents, it seems to explain everything one needs to get off the ground and running. Lots of examples based on STAN here which is nice.

For ML, the previously mentioned forecasting textbook has an okay section on ML, https://otexts.com/fpp3/nnetar.html but I'll try to find some other decent e-book that is more comprehensive. Alternatively, I'd definitely recommend Medium or other ML blogs that you can find via googling. Most of of these do a pretty good job at explaining concepts and providing realistic examples.