seattleflu/incidence-mapper performs geospatial modeling for epi and incidence data using methods based on R-INLA and an API service to deliver modeled data to seattleflu-viz (to come).
R packages encapsulate key aspects of the workflow.
Incidence-mapper exists to perform three different classes of routine tasks on enrollment and pathogen incidence data.
Models are defined by:
FLU_A_H1
or h3n2
) or all
(for all samples regardless of pathogen), or unknown
(the default I have until the taqman data is made available).[encountered_week, residence_puma]
or [flu_shot, residence_census_tract]
smooth
or latent
), or, if that makes little sense on your end, the outcome you want models of, like (count
and fraction
(observed), or intensity
(latent)).All examples included in this commit are timeseries models by encountered_week
. Both model types need not include time in general, but all the ones for May 22 will. To collapse over time to produce a static map, it is approximately valid to average outcomes over time. Long-term, there are more interesting, and more correct, things to do, but that's not bad and retains interesting meaning for people exploring the data.
Smooth
modelsSmooth
models are interpolations and extrapolations from the observed data. For the purposes of visualization, these models complement raw data summaries in two ways:
The smooth
models in this repo are faceted by factors about our participants, like site_type
(where the sample was taken), sex
(male or female), and flu_shot
(self-reported "have you had flu vaccine in last year?" or records, 1=yes, 0=no). It would be nice to have a selector for factors to facet by, but tell me if you want the deployed models to have fewer facets. From point of view, the only non-negotiable facet is site_type
as differences in the total counts and residence locations captured by our different collection modes are very important to our study partners for understanding what we collected this year.
Latent
field modelsLatent
field models represent an inference of the underlying force of infection in the total population over space and time, after adjusting for features of our sampling process and factors associated with our participants. The "latent field" is estimated during the estimation of the observed models above. It represents a model of the residual variation in the data that is not explained by observed factors like site_type
, sex
, or flu_shot
. From the inferred latent field, we can produce an output I'm calling modeled_intensity
which is an un-normalized estimate of total population incidence. I'm not yet producing normalized incidence estimates because we (1) haven't linked to census population data yet and (2) haven't worked out what we can about "denominator data" for our sample (what fraction of people who could've participated were sick enough to partipate and willing to enroll?). Regardless, the relative information in the modeled_intensity
, both in time and space, can be interpreted similarly to true incidence, and through the modeled intensity, we get a picture of the dynamics of transmission.
The most important thing the latent field model adjusts for is an estimate of the "catchment" of each site_type
. The catchment of each site estimates how likely people at each residence location are to have participated in our study, independent of if they are sick with the pathogen of interest.
For example, to estimate the catchment of kiosk
collection for h1n1pdm
, we infer a smooth
model for the expected number of participants in each residence location with all non-h1n1pdm infections who partipate in kiosk sampling, aggegrated over the entire study duration (no time-dependence). We take this as a measure of the rate at which people would interact with a kiosk, independent of being sick with h1n1pdm
, the pathogen of interest, and averaged over the variations in space and time of the many other pathogens that also drive participation.
The key assumptions are that (1) averaging of pathogen dynamics that are (2) independent of the specific pathogen being modeled reveals the underlying acces to kiosks and willingness to participate of people in each mapped residence location.
Given the estimated catchment of each site_type
, the observed number of h1n1pdm
cases over space and time are modeled relative to the catchment. In places where we get a lot of samples for lots of reasons, a few h1n1pdm
samples represents a low intesity of transmission. But in places where we get few samples other than h1n1pdm
, we infer that flu intensity at that residence location and time is high. The latent field model collectively estimates this intensity across the whole map, thus inferred space-time properties of the total population epidemic.
What are the models (in words, equations, and code) and what do they do?
Depending on the outcome of interest, age is either like a factor or like time, and so the models and viz will need to depend on the cognitive task.
These will be an example of a different cognitive task, where the inference of interest is a parameter and not a direct transformation of data. Line, bar, and map chart viz components will likely be similar, but context will need to be different. This and other intervention effectiveness summaries will be really important for Year 2.