grambank / grambank-analysed

3 stars 0 forks source link

Final model decision: Dual models for each parameter, with varied spatial parameters #55

Closed SamPassmore closed 2 years ago

SamPassmore commented 2 years ago

To summarize a discussion with @RustyGray, @QuentinAtkinson, @HedvigS and I, with input from @blasid :

We decided we only need to run the dual process model. The single process models, while statistically interesting, will not provide us with any results with which we would want to discuss, since it is generally agreed space and phylogeny are important processes in language diversity. Including the AUTOTYP categories was ruled out because it was not a process effect, like space and phylogeny, but identifies typological areas of language diversity. It does this quite well, but that is not a question we are interested in.

The reason* AUTOTYP areas were initially included was because it might be that spatial effects operate on multiple levels, localised effects from regular contact, which our covariance matrices are capturing, and long range spatial diffusion that might occur over a longer range via processes such as population movement /expansion, or serial borrowing.

*I am a bit hazy on whether this was definitely the reason so please correct me if this was wrong.

The simplest solution to test this is to run the dual model with a number of spatial parameters that capture covariance across larger distances. Testing multiple parameters will also give us more confidence in the presented results. There are other solutions (e.g. including multiple spatial effects), but we agreed this approach would be sufficient for what we want to know.

I propose these three sets of parameters: (k)appa = 2; (s)igma = 1.15 (current parameters) (RED) k = 2, s = 2 (BLUE) k = 2.5, s = 3 (GREEN)

They suggest that similarity via the process of geographic proximity will reach approximately zero at a distance of 1,000km (RED), 2,500km (BLUE), and 3,000km (GREEN) from a focal society (these distances are the radius). The final purple line isn't necessary to run, it is just indicative of alternative decay functions.

These are graphed out below: spatial_parameter_fig

To implement this with minimal code changes I think the best approach would be to create and save the three spatial covariance matrices within the make_precisionmatrices.R file. Then load and apply them to three models in the modelling script.

I know Hedvig, you want to keep scripting all models, but it would save computational time, and result in cleaner code if the scripts show what our current plan is (which seems to be to be relative definitive). We can always get the old scripts from the git history if need be.

SamPassmore commented 2 years ago

*A note on why the lines are not smooth. I am fairly certain this is because the plotting function doesnt like it when there are lots of societies close together - rather than the covariance function producing strange results. The pattern is much smoother when plotting each line individually, for example.

HedvigS commented 2 years ago

@SamPassmore thank you! That's well summarised and put, I really appreciate it.

What I'll do just to scratch that itch to not remove things until absolutely necessary is move that script to this folder: R_grambank/spatiophylogenetic_modelling/analysis/old/ . I know git history is great and I do like to use it sometimes, but i've also found that it can be difficult sometimes. Since we already store some other outdate scripts in this old dir, I'd like to move it there and when the time comes I can remove all of them if necessary :)

Did you do the round of long/lat decimals and spatial jittering for this decay plot above?

HedvigS commented 2 years ago

I'm standing by to implement Sam's suggestion for the different spatial prec matrices settings. I'd like to clear up #53 first through, just so it doesn't get confusing.

HedvigS commented 2 years ago

Hey @blasid here's where you can talk to Sam about spatial decay if you want ^^!

HedvigS commented 2 years ago

This discussions is similar to the ones we had between Quentin and Sam last year right, which I think inspired this script?

SamPassmore commented 2 years ago

This discussions is similar to the ones we had between Quentin and Sam last year right, which I think inspired this script?

Yes - this was to test which spatial decay parameters offered a best fit of the data - although it was run on the PCA variables I believe.

HedvigS commented 2 years ago

@SamPassmore Yes, it was.

That script is still in the folder "old" in the sp dir if you want to poke at it at all

HedvigS commented 2 years ago

@blasid and @RustyGray and I had a discussion today which led to us being interested in the results of the trial model after all. @blasid is getting Sam, Quentin and Russell D together to talk more about how space is modelling in our models, both BRMS and INLA. In the meantime I'm grateful that we've saved the model output's for all models :).

HedvigS commented 2 years ago

I'm hoping to just have to do one more re-run this week, and I'm hoping to do a couple of different spatial settings and the ASR at the same time. I'm changing the script now to accommodate for Sam's suggestion with the spatial parameters.

HedvigS commented 2 years ago

Alright, I'm almost finished running all the different permutations (see #62 ). I'll be reporting on the kappa 2 sigma 1.15 in the main text for tmrw.

@SamPassmore do you want to make new scripts for the spatial decay plot and the variograms for the supplementary figures? Or should I just remod the scripts we have in https://github.com/grambank/grambank-analysed/tree/main/R_grambank/spatiophylogenetic_modelling/analysis/old?

SamPassmore commented 2 years ago

See #64 which recreates the spatial decay plots. I don't think the variograms need to be included anymore since it would be too confusing to produce variograms for each feature (previously it was for each PC). Plus, the variograms were to support the model, but the simulations offer than support now I think.

HedvigS commented 2 years ago

@SamPassmore

I'll review #64 asap

let's ditch variograms, understood.

HedvigS commented 2 years ago

@SamPassmore @rdinnager I have run the INLA models (dual and trial) on all real features with 3 different settings for kappa and sigma. The results are in the usual place #45 .