Summary function / general workflow

alexgenin commented 9 years ago

Having all the single-indicator functions, we need a user-friendly function now that computes all the indicators and reports about their values. I suggest following the usual pattern of creating a new object class "shift_indicator" and add usual methods.

A user workflow could look this way (a bit similar to a glm workflow basically) :

convert original data to binary matrix(/ces) (use as.* function family)
run the indicators function to create the result object containing all the appropriate indicator values (naming is to be discussed)
using the summary function on the indicators object, see what indicators show an interesting value
plot diagnostics

What I have in mind for a summary function is something with an output like :

 > summary(glm(Sepal.Length ~ Sepal.Width + Species, data = iris))

Call:
glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.30711  -0.25713  -0.05325   0.19542   1.41253  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         2.2514     0.3698   6.089 9.57e-09 ***
Sepal.Width         0.8036     0.1063   7.557 4.19e-12 ***
Speciesversicolor   1.4587     0.1121  13.012  < 2e-16 ***
Speciesvirginica    1.9468     0.1000  19.465  < 2e-16 ***

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Thoughts ? Ideas for the plot function ?

alexgenin commented 9 years ago

I realize now this is a bit related to issue #5.

alexgenin commented 9 years ago

OK, so the indicators object should have the following structure:

list(variance = list(value = xx, ...), # OR variance = xx
     skewness = list(value = xx, ...), 
     psd = `psd object`, 
     ...) # Component names are taken from the list of functions above.
(class: c('spatial_indicators', 'list'))

We want the summary function to produce an output that looks more or less like this :


Indicators computed on 6 replicates 

Numerical indicators: 
==============================
IC       | mean±sd
variance | 3.4±.153
skewness | .056±1.4
moranI   | 32±5

Non-numerical indicators: 
==============================
psd: 
<summary of PSD fit>

(... other non-numerical indicators)

The summary function should run as following:

Loop through all the indicators of the object:
- for all elements that have a value= component, then consider them as numerical indicators and show a line in the table (e.g. variance, skewness, etc.)
- for all elements that do NOT have a value= component, then if the summary() function exists for their class, print the results of that summary function (e.g. psd)
- for all other elements, do not do anything but return a warning

alexgenin commented 9 years ago

Added print.method in db5d93d8f3ffa3832925db57fa9ccb04bd1a1943

ssumithra commented 9 years ago

We feel that your proposed output structure looks excellent. We should add Mean and Spectral Density Ratio to the list of Numerical Indicators. As for non-numerical indicators, these would include PSD with summary of the fit and graphs for the correlation and spectral functions. As of now we don't want to include automated fitting procedures for the Spectral density and correlation functions because it will be highly context dependent.

If it would be helpful, I can provide a representative table that includes the structure of the PSD fitting summary, etc, a little later.

We should drop fractal geometry from this list of 'indicators' because the interpretation of it as an early-warning signal is unclear.

guttal commented 9 years ago

Again, I may be posting this in a wrong thread. I wanted to think from the perspective of a user of the package. A researcher may want to use this package for the following scenarios:

1(a) Analyze spatial early warnings for CA like models where data is discrete. In such cases, the user will have multiple replicates for each parameter value of the stressor. 1(b) Analyze spatial early warnings for reaction diffusion equation like models where data is continuous. In such cases, the user will have multiple replicates for each parameter value of the stressor. 2 (a) Real data with only one snapshot 2 (b) Real data along some stressor gradient (so each stressor will typically have one or very few replicates)

In addition, for each of the scenarios, the user may want to use only some category of spatial warnings. I wonder whether the workflow you guys are thinking of captures all of these scenarios. The flow chart in Kefi et al, 2014, Plos One provides some ideas on the flow of the analyses.

alexgenin commented 9 years ago

Hi all,

We spent some time yesterday to rethink the overall workflow we could provide for a user of the spatialwarnings package. We ended up with this beautiful flowchart :

Basically, that means structuring the workflow into "tasks" which are families of similar indicators

Generic warning signals (lag1-AC, var, skew, mean, heteroskedasticity [1])
Spectrum-based signals (r, $\theta$ spectrum, skewness of pixel distribution [1], powerspectrum)
Patch-based indicators (psd, fit of psd, ews computed on psd [1])
Potential analysis [1]

For each of those tasks, the user would go through a compute/test/plot workflow as those families share most testing and diagnostic tools (we can further split those families later if it makes more sense). The plots at the end would require the user to provide input data about the stressor gradient to associate each replicate with a proper stressor value.

Mind the possibility of quantitative data as input which is something that was definitely overlooked before.

This way we cover all the workflows mentioned above while providing some guidance out of all the possible indicators.

Does that make sense ?

[1] Not implemented yet

spatial-ews / spatialwarnings

Summary function / general workflow #19