Closed rsangole closed 5 years ago
@andrew3cooper @kapelinskim6 @stephenhage - modellers: you may benefit from this brainstorming list I'm making for myself too...
Notes from reading this:
Modelling approach was in line with what I had in mind -- a regression model for predicting the number of mosq + a classification model to predict wnv presence.
Few ideas:
I agree with the approach. I'm working in parallel on some of these issues.
I just constructed weekly time series at the trap level without imputation (a multiple time series object). It's informative. I repeated with weekly time series at the community area level. Also informative. We may be able to use these data without going all the way up to monthly level. We will benefit from some kind of clustering as you previously suggested @rsangole . I'd like to make a quick attempt to do this by simply combining adjacent communities when they have sparse data.
I'll see if I can push my R code and plots to GitHub on a work break later today.
Another quick note -- at a very high level -- I had assumed we'd be doing prediction using observed (future) weather data rather than forecasting without any future data. That's basically how kaggle had the competition set up as well, despite the fact that train/test splits were in alternating years.
I talked about using lagged trap results in another post. I think that's something we can come back to at the end as a stretch goal to demonstrate that the visualization platform can give within-season forecasts 1 to x many weeks in the future. There's real business (public health) value in doing so.
Maintaining a task list for myself here.
Data Processing
EDA and Hypothesis Testing
Feature Engineering
Reading
Modeling
mlr
correctly (especially to use the model comparison codes)Feature Reduction Activities
Reporting