grambank / grambank-analysed

3 stars 0 forks source link

Update extract posteriors #84

Closed HedvigS closed 1 year ago

HedvigS commented 1 year ago

hi @SamPassmore . I realised that our extract INLA posteriors script was still working on the assumption that are doing the three "single"-models still, so I changed that so that it only extracts for the dual and trial.

i could build in more ifs etc to make it more adaptable but for now this is where I left it.

HedvigS commented 1 year ago

I'm doing some things to the plots based on the posteriors, I'll check that in here too so you can review it all at once.

HedvigS commented 1 year ago

there's usually an option to make this PR a draft, but I don't see that now. I think that's because we don't have the business plan for this github organisation.

HedvigS commented 1 year ago

I'm having a problem here, the way the posteriors are calculated in grambank_dualplots (the script that makes the faceted scatterplot of spatial and phylo effect per domain), the spatial effect is teeny tiny which means the ellipses fail to render. I'm a bit confused.

We've got two scripts which extracts posteriors for the dual model:

and they're not doing the same thing.

In the grambank_dualplots.R script the mean spatial is like this:


> dual_summary$mean_spatial %>% round(6) %>% sort()
  [1] 0.000000 0.000000 0.000000 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001
 [13] 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000003 0.000003
 [25] 0.000003 0.000003 0.000003 0.000004 0.000004 0.000004 0.000004 0.000004 0.000004 0.000005 0.000005 0.000005
 [37] 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000006 0.000006 0.000006 0.000007
 [49] 0.000007 0.000008 0.000008 0.000008 0.000008 0.000009 0.000010 0.000010 0.000010 0.000011 0.000011 0.000011
 [61] 0.000012 0.000013 0.000014 0.000017 0.000018 0.000018 0.000019 0.000020 0.000026 0.000026 0.000027 0.000027
 [73] 0.000031 0.000033 0.000034 0.000034 0.000039 0.000046 0.001655 0.012284 0.023143 0.027663 0.032728 0.035173
 [85] 0.044937 0.045537 0.046035 0.046602 0.048497 0.049318 0.058788 0.067351 0.071522 0.072729 0.073983 0.076320
 [97] 0.088199 0.089531 0.097960 0.099846 0.111814 0.115352 0.117746 0.139083 0.147275 0.156652 0.165129 0.165466
[109] 0.196490 0.207081 0.214913 0.251665 0.274775
rdinnager commented 1 year ago

I had a look. For me, they produce the same results. Using posteriors_df that was produced by extract_INLA_posteriors.R:

> dual_summary2 <-  posteriors_df %>% filter(grepl("_kappa_2_sigma_1.15_pcprior0.1", fn)) %>%
+   dplyr::select(phylogenetic = `Precision for phylo_id_in_dual`, 
+          spatial = `Precision for spatial_id_in_dual`,
+          fn) %>%
+   group_by(fn) %>%
+   summarise(mean_phylogenetic = mean(phylogenetic),
+             mean_spatial = mean(spatial),
+             error_phylogenetic = sd(phylogenetic),
+             error_spatial = sd(spatial),
+             )
> dual_summary2$mean_spatial %>% round(6) %>% sort()
  [1] 0.000000 0.000000 0.000000 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001
 [11] 0.000001 0.000001 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002 0.000002
 [21] 0.000002 0.000002 0.000003 0.000003 0.000003 0.000003 0.000003 0.000004 0.000004 0.000004
 [31] 0.000004 0.000004 0.000004 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005
 [41] 0.000005 0.000005 0.000005 0.000005 0.000006 0.000006 0.000006 0.000007 0.000007 0.000008
 [51] 0.000008 0.000008 0.000008 0.000009 0.000010 0.000010 0.000010 0.000011 0.000011 0.000011
 [61] 0.000012 0.000013 0.000014 0.000017 0.000018 0.000018 0.000019 0.000020 0.000026 0.000026
 [71] 0.000027 0.000027 0.000031 0.000033 0.000034 0.000034 0.000039 0.000046 0.001655 0.012284
 [81] 0.023143 0.027663 0.032728 0.035173 0.044937 0.045537 0.046035 0.046602 0.048497 0.049318
 [91] 0.058788 0.067351 0.071522 0.072729 0.073983 0.076320 0.088199 0.089531 0.097960 0.099846
[101] 0.111814 0.115352 0.117746 0.139083 0.147275 0.156652 0.165129 0.165466 0.196490 0.207081
[111] 0.214913 0.251665 0.274775

Is it possible you are looking at the wrong spatial parameters? The grampbank_dualplots.R is only looking at files with this suffix: filename_suffix = "_kappa_2_sigma_1.15_pcprior0.1.qs".

rdinnager commented 1 year ago

The code in the two scripts is different but it is doing the same thing, so they should produce the same result. Both appear to be doing the correct calculation, as far as I can tell..

HedvigS commented 1 year ago

Right okay. So.. hm. My problem right now is just to confirm that what is correct to do, and then i'll change the scripts to do the same thing. Then I'll re-render the figure with the facetet plot and elipises. Currently some ellipses won't plot, I thought it was because of the tiny spatial effect values. Then I'll get started on a statistical test Russell (G) wants to do.

HedvigS commented 1 year ago

Okay, what I've done now is sanity check how it's done in the extract_INLA_posteriors script and then just made the grambank_duaplots read that in directly and not read it in itself.

HedvigS commented 1 year ago

I've still got what I think are points that are not possible to plot in the graph, ggplot is complaining like this when i run grambank_dualplots.


Warning messages:                                                                                               
1: Removed 30 row(s) containing missing values (geom_path). 
2: Removed 30 row(s) containing missing values (geom_path). 
3: Removed 30 row(s) containing missing values (geom_path). 
4: Removed 30 row(s) containing missing values (geom_path). 
5: Removed 30 row(s) containing missing values (geom_path). 
6: Removed 30 row(s) containing missing values (geom_path). 

and the resulting plot seems to miss some points.

HedvigS commented 1 year ago

or wait... the red points in the bottom left facet are there they're just really low and they've got essentially no ellipses?

spatiophylogenetic_figure_panels_ellipses_domain__kappa_2_sigma_1 15_pcprior0 1

HedvigS commented 1 year ago

the results look different from what we've got in the plot that's in the actual ms :(

HedvigS commented 1 year ago

@SamPassmore @rdinnager I can't tell exactly how the old numbers and plots were generated, but as far as I can tell the scripts check out now. See my ranty emails for more details.

I'd like to merge in this branch now if you two approve :)

HedvigS commented 1 year ago

@SamPassmore @rdinnager if you want to review this PR but not read commit history, you can see a summary of all files changed here: https://github.com/grambank/grambank-analysed/pull/84/files