Final model decision: Dual models for each parameter, with varied spatial parameters

SamPassmore commented 2 years ago

To summarize a discussion with @RustyGray, @QuentinAtkinson, @HedvigS and I, with input from @blasid :

We decided we only need to run the dual process model. The single process models, while statistically interesting, will not provide us with any results with which we would want to discuss, since it is generally agreed space and phylogeny are important processes in language diversity. Including the AUTOTYP categories was ruled out because it was not a process effect, like space and phylogeny, but identifies typological areas of language diversity. It does this quite well, but that is not a question we are interested in.

The reason* AUTOTYP areas were initially included was because it might be that spatial effects operate on multiple levels, localised effects from regular contact, which our covariance matrices are capturing, and long range spatial diffusion that might occur over a longer range via processes such as population movement /expansion, or serial borrowing.

*I am a bit hazy on whether this was definitely the reason so please correct me if this was wrong.

The simplest solution to test this is to run the dual model with a number of spatial parameters that capture covariance across larger distances. Testing multiple parameters will also give us more confidence in the presented results. There are other solutions (e.g. including multiple spatial effects), but we agreed this approach would be sufficient for what we want to know.

I propose these three sets of parameters: (k)appa = 2; (s)igma = 1.15 (current parameters) (RED) k = 2, s = 2 (BLUE) k = 2.5, s = 3 (GREEN)

They suggest that similarity via the process of geographic proximity will reach approximately zero at a distance of 1,000km (RED), 2,500km (BLUE), and 3,000km (GREEN) from a focal society (these distances are the radius). The final purple line isn't necessary to run, it is just indicative of alternative decay functions.

These are graphed out below: spatial_parameter_fig

To implement this with minimal code changes I think the best approach would be to create and save the three spatial covariance matrices within the make_precisionmatrices.R file. Then load and apply them to three models in the modelling script.

I know Hedvig, you want to keep scripting all models, but it would save computational time, and result in cleaner code if the scripts show what our current plan is (which seems to be to be relative definitive). We can always get the old scripts from the git history if need be.

SamPassmore commented 2 years ago

*A note on why the lines are not smooth. I am fairly certain this is because the plotting function doesnt like it when there are lots of societies close together - rather than the covariance function producing strange results. The pattern is much smoother when plotting each line individually, for example.