lizzieinvancouver / decsens

0 stars 0 forks source link

real data -- simple sensitivities #1

Closed lizzieinvancouver closed 4 years ago

lizzieinvancouver commented 4 years ago

Re-do the PEP work on BETPEN using a log-log regression (i.e., instead of days ~ C, do log(days) ~ log(C)).

cchambe12 commented 4 years ago

@lizzieinvancouver I forgot we can't take the log of negative numbers. I tried subsetting the dataset to start March 1st but I was still getting freezing temperatures so I converted everything to Kelvin and then compared to what we had without the log...

Now calculate temperature sensitivities but with log

estprecc.log <- lm(log(lo)~log(tempk), data=subset(bptempandbb, cc=="1950-1960")) 
estpostcc.log <- lm(log(lo)~log(tempk), data=subset(bptempandbb, cc=="2000-2010"))

logdiffbefore.after <- coef(estprecc.log)[2]-coef(estpostcc.log)[2]

negative means a decline in sensitivity AFTER climate change logdiffbefore.after = -0.33, exponentiated we get 0.718 - so an increase in sensitivity!

Now calculate temperature sensitivities as we did previously

estprecc <- lm(lo~tempk, data=subset(bptempandbb, cc=="1950-1960")) 
estpostcc <- lm(lo~tempk, data=subset(bptempandbb, cc=="2000-2010"))

diffbefore.after <- coef(estprecc)[2]-coef(estpostcc)[2]

diffbefore.after = -0.1529792

cchambe12 commented 4 years ago

@cchambe12 find credible intervals next

lizzieinvancouver commented 4 years ago

@cchambe12 I totally forgot about that too! Do we actually have years with negative mean temperatures though?

cchambe12 commented 4 years ago

@lizzieinvancouver No! I was using the wrong column. All is well and we don't need to use Kelvin after all, which is a relief. I've made output for the PEP stuff and for sliding window.

decsens/analyses/pep_analyses/output/bpestimates_withlog.csv

I will make a README.txt with column descriptions now.

lizzieinvancouver commented 4 years ago

@cchambe12 This is great! Thank you so much for doing this! I am working to update the manuscript. I have a couple more requests (sorry) ... could we add a decade (1990-2000?) and could we add Fagus?

cchambe12 commented 4 years ago

@lizzieinvancouver I was thinking of adding Fagus too! And yeah, no problem with the added decade. Would you like the same for the SWA or wait to see how this goes first?

lizzieinvancouver commented 4 years ago

@cchambe12 Great on Fagus and another decade! Let's start there and add SWA later if needed. Thank you!

cchambe12 commented 4 years ago

@lizzieinvancouver All set!! I added the new data to the output folder

FAGSYL: fsylestimates_withlog_1950_1990_2000.csv BETPEN: bpenestimates_withlog_1950_1990_2000.csv

lizzieinvancouver commented 4 years ago

@cchambe12 Wow! This was so fast. I just updated the plotting code -- thanks. One thing that is a little odd to me is that the warming is about the same (90s equals 2000s, but this might make sense, in which case -- with apologies I might ask that we add another decade) ... to discuss tomorrow.

cchambe12 commented 4 years ago

@lizzieinvancouver Yeah no problem at all! I was wondering if adding more species would be helpful too. Anything from PEP is easy to add!

lizzieinvancouver commented 4 years ago

@cchambe12 Thanks! I think we will try some other years -- I was just looking at German climate for when the warming trend began there (see the German Climate Atlas and it looks like maybe the 70s ... so that's something to consider. Also, if we should go to a bigger window for safety (e.g., 20 is more standard for 'climate'). Talk to you soon!

lizzieinvancouver commented 4 years ago

@cchambe12 Try with March to May temperatures and 20 year windows .... confidence intervals ... 1990-2010 versus 1950-1970.

lizzieinvancouver commented 4 years ago

@cchambe12 Sorry to just think of this now, but if we're doing 1950-1970 and 1990-2010, then let's add 1970-1990 (and I really assume that mean temp should fall between the two). As for confint, I suggest you pick what sounds good to you ... 84, 90, 80 ... whatever other than 95 or something more extreme. Thank you again for all your help on this!

cchambe12 commented 4 years ago

@lizzieinvancouver I just pushed the real data output for 1950-1970 and 1990-2010 with 89 for the confint. I will add the new timeframe now!

cchambe12 commented 4 years ago

I think this is done!

lizzieinvancouver commented 4 years ago

@cchambe12 This is awesome! Check out the means ... but I am not sure how to interpret everything.... email to follow. Also @AileneKane @dbuona @MoralesCastilla

> mean.betpen <- aggregate(df[c("matslope", "matslopelog", "meanmat", "varmat", "meangdd",
+     "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89")],
+     df["cc"], FUN=mean)
> 
> tempdiff1 <- mean.betpen$meanmat[which(mean.betpen$cc=="1970-1990")]-
+     mean.betpen$meanmat[which(mean.betpen$cc=="1950-1970")]
> tempdiff2 <- mean.betpen$meanmat[which(mean.betpen$cc=="1990-2010")]-
+     mean.betpen$meanmat[which(mean.betpen$cc=="1950-1970")]
> mean.betpen
         cc  matslope matslopelog  meanmat    varmat  meangdd matslopeconfint11
1 1950-1970 -6.053967  -0.4157587 7.910215 1.2324478 72.47753         -8.045821
2 1970-1990 -6.843966  -0.4895361 8.120306 0.8399633 72.15105        -10.293221
3 1990-2010 -2.115403  -0.1773680 9.037770 0.8897746 60.03246         -4.456350
  matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1        -4.0621117           -0.5577323          -0.27378504
2        -3.3947120           -0.7479982          -0.23107402
3         0.2255439           -0.3754453           0.02070938
> mean.fs <- aggregate(fs[c("matslope", "matslopelog", "meanmat", "varmat", "meangdd",
+     "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89")],
+     fs["cc"], FUN=mean)
> x <- c(1:length(clim))
> mean.fs 
         cc  matslope matslopelog  meanmat    varmat  meangdd matslopeconfint11
1 1950-1970 -4.522218  -0.2801439 7.657062 1.2448377 86.01584         -6.581286
2 1970-1990 -2.347885  -0.1508822 7.858652 0.8656629 81.31031         -5.250960
3 1990-2010 -2.731600  -0.2066306 8.866196 0.8969811 79.92338         -4.741830
  matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1        -2.4631509           -0.4126363          -0.14765147
2         0.5551895           -0.3430452           0.04128080
3        -0.7213705           -0.3658382          -0.04742298
lizzieinvancouver commented 4 years ago

Wait, let's just do it here @cchambe12 @AileneKane @dbuona @MoralesCastilla

Thanks to Cat we already have updated results from our conversation Monday! The plot is here that the logged results are on every plot, the right-side is a zoom). The dots are means, the lines are 89% confidence intervals (or 78% ... they may be 11% to 89%). I used exp() to convert the logged coefs, but I am not sure that's right. Here's what's clear:

(1) The 20 year windows are WAY better. The climate is more stable across species (phew).

(2) There are some similarities in trends with the raw and logged regressions, but I cannot see through them.

(3) Betula is doing something more exciting than FagSyl. It may be actually more sensitive recently ... which lines up with the GDD findings.

(4) Variance in MAT does go down, but not when the sensitivity does.

Here's what I need help with:

(a) How do we interpret the logged coefs? Should we back-convert them? If so, how? Can they ever be negative? I think not mathematically, which seems wrong ... https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/

This is sort of critical before we can figure what to say.

lizzieinvancouver commented 4 years ago

@cchambe12 Can you confirm the intervals and also what window the GDD is calculated over (some months or until leafout for each species x site x year)? Thanks!

cchambe12 commented 4 years ago

@lizzieinvancouver Yes sorry it meant 78% credible intervals! "GDD" is 30 days before leafout for each species x year x site "mat.lo" is mean temperature 30 days before leafout "mat" is mean spring temperature from March 1 to May 31

I'd be happy to adjust as necessary!

As for back-converting... I also used exp(). The back-converted values should never be negative as you cannot log a negative number but the logged outcomes can be negative.

lizzieinvancouver commented 4 years ago

@cchambe12 No, I think this is great. It's just interesting to me that they are requiring less GDD, which is sort of more sensitive ... I think.

lizzieinvancouver commented 4 years ago

@AileneKane My main questions are:

  1. Can we back-convert? If not (or not easily), what would be the interpretation of the change in log-log sensitivities (and perhaps I should not have exp() them ... see my code here which does that).
  2. Either way, how to interpret the change in sensitivity? Can we say they became more sensitive ...?
AileneKane commented 4 years ago

@lizzieinvancouver

  1. I hadn't realized this in our early discussion of these models, but there is not a great way to back-convert logged response variables with continuous predictors (especially when those predictors are also logged). It is not a simple "exp()" unfortunately. See http://www.biostat.jhsph.edu/~iruczins/teaching/jf/ch8.pdf

  2. I'm not sure yet what we can say about changes in sensitivity. I'm not sure we want to use the log-log approach, if our goal is to compare slopes on a natural scale...

lizzieinvancouver commented 4 years ago

@AileneKane @cchambe12 I just confirmed (see figure here) that we need a log-log. So I will probably go forward without back-converting.

lizzieinvancouver commented 4 years ago

@cchambe12 The variance in leafout for betpen goes way down, can you double-check the code and that we are not dropping some crazy outlier site across the 20 year windows?


> mean.betpen 
         cc  matslope matslopelog  meanmat    varmat     varlo  meangdd matslopeconfint11
1 1950-1970 -6.053967  -0.4157587 7.910215 1.2324478  79.88591 72.47753         -8.045821
2 1970-1990 -6.843966  -0.4895361 8.120306 0.8399633 104.82786 72.15105        -10.293221
3 1990-2010 -2.115403  -0.1773680 9.037770 0.8897746  36.21610 60.03246         -4.456350

> mean.fs
         cc  matslope matslopelog  meanmat    varmat    varlo  meangdd matslopeconfint11
1 1950-1970 -4.522218  -0.2801439 7.657062 1.2448377 63.35417 86.01584         -6.581286
2 1970-1990 -2.347885  -0.1508822 7.858652 0.8656629 56.19254 81.31031         -5.250960
3 1990-2010 -2.731600  -0.2066306 8.866196 0.8969811 32.82193 79.92338         -4.741830
cchambe12 commented 4 years ago

@lizzieinvancouver I tried modifying the site selection and rerunning the code but it returned the same answers... it does seem strange. I will continue to look into this but at first glance there isn't an obvious issue with the code or site selection

lizzieinvancouver commented 4 years ago

@cchambe12 It could be real. Perhaps you could make the type of ggplots you made for the budburst ms? Then we could get a look at the data better.

lizzieinvancouver commented 4 years ago

@cchambe12 I think something is up with our estimates (either our old or new ones)... I was just looking at the estimates we had in mid-January....

Below code from pepplotting.R

> df.old <- read.csv("pep_analyses/output/bpenestimates_withlog.csv", header=TRUE) # earlier 10-year estimates
> 
> mean.betpen.old <- aggregate(df.old[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+     df.old["cc"], FUN=mean)
> 
> mean.betpen.old
         cc  matslope matslopelog  meanmat   varmat     varlo  meangdd meanmatlo
1 1950-1960 -4.282768  -0.1693573 5.602167 3.377819 110.51111 71.70517  7.041625
2 2000-2010 -3.611683  -0.2204338 6.578362 1.214370  46.95728 64.62913  6.803512

And then I calculated new estimates (below code is from betpen_tempsensitivities.R):

> meanhere.10yr[which(names(meanhere.10yr) %in% c("cc", "matslope","matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo"))]
         cc  meanmat    varmat meanmatlo     varlo  meangdd   matslope matslopelog
1 1950-1960 8.027834 1.5063816  6.992420 103.14902 68.65143  -6.511172  -0.4549991
2 1960-1970 7.792597 1.0464629  7.183117  63.74706 76.30362  -5.575453  -0.3692203
3 1970-1980 7.924367 0.2715488  6.835710  92.05621 65.84752 -11.876632  -0.8398790
4 1980-1990 8.316245 1.4005000  7.492977 111.57255 78.45458  -5.907674  -0.4203201
5 1990-2000 9.015313 0.9692797  6.493816  37.64641 55.25233  -1.362697  -0.1167921
6 2000-2010 9.060227 0.8985988  6.855772  36.34706 64.81260  -2.915261  -0.2457476
lizzieinvancouver commented 4 years ago

@cchambe12 I'll keep checking the code in betpen_tempsensitivities.R that I wrote and see if there are errors there, but if you have any ideas, let me know.

cchambe12 commented 4 years ago

@lizzieinvancouver I'm not surprised they are different. For back in January, I selected sites that only needed to have observations for those 20 years (1950-1960 & 2000-2010). For this declining sensitivities, I am selecting sites that have observations for 60 years (1950-2010). So the number of sites goes down significantly (from 48 to 17). I've streamlined the code to be liberal with sites, I could rerun the code to have different sites for each 20 year period but then I feel like we can't compare results...?

lizzieinvancouver commented 4 years ago

@cchambe12 Ah -- that makes perfect sense! I should have thought of this. But we should think more about this -- we wanted to switch to 20 year windows to have better confidence in our results, but if we are trading it off with fewer sites I am not sure that's the best approach.

lizzieinvancouver commented 4 years ago

@cchambe12 Let me think on this some ... if we really want inference on what is going on with PEP we should probably switch to some sort thoughtful hierarchical model where we can leverage more sites and more years. But! I think that is well outside the scope of this paper, where the PEP results are snippet at most. So, let me think on it. My sense is that more sites is better than more years ... both because I expect really high site-to-site variability (observer differences, location etc.) and because 17 x 20=340 observations while 48 x 10=480.

cchambe12 commented 4 years ago

@lizzieinvancouver Ahh yes you're right, a hierarchical model is the usual answer here...

We are losing a lot of data here by going down to only 17 sites. I'll play around with using the 10 year windows again and then just looking at 1950-1970 and 1990-2010 rather than all three time windows. Maybe one of those options will give us back enough sites but still be informative?

lizzieinvancouver commented 4 years ago

@cchambe12 I honestly feel like those 40 years will probably be fairly similar to the 60 year problem, but have a stab if you have time!

cchambe12 commented 4 years ago

@lizzieinvancouver You were right!

Here's the breakdown if it's helpful: BETPEN: 10-year timeframes x 2 periods -> 48 sites total (1950-1960 & 2000-2010) 20-yr x 3 periods -> 17 sites (1950-1970 & 1970-1990 & 1990-2010) 20-yr x 2 -> 26 sites (1950-1970 & 1990-2010) 10-yr x 3 -> 32 sites (1950-1960 & 1970-1980 & 2000-2010)

FAGSYL: 10-yr x 2 -> 47 sites 20-yr x 3 -> 24 sites 20-yr x 2 -> 29 sites 10-yr x 3 -> 37 sites

lizzieinvancouver commented 4 years ago

@cchambe12 Super helpful! Thanks! I think we should try to present a mix of results if we can do it without confusing folks and I think we have most of what we need. But I have one more request (I know, I know, always 'one more'): could you estimate the per-site variance for both lo and mat after logging the lo and mat? That means in addition to providing var(mat) and var(lo), could you also provide (var(log(mat)) and var(log(lot)) for each site x time period? I plan just to use the 20 year windows and the 1950-1960 and 2000-2010 year windows.

cchambe12 commented 4 years ago

@lizzieinvancouver Great! Haha these are easy requests! I'll rerun it to make sure to get all possible sites and add these columns. It should be done running by tonight or tomorrow morning at the latest.

lizzieinvancouver commented 4 years ago

@cchambe12 Wait! One more tweak -- can we make sure all the files have the 11 and 89 confint? Thank you again! Ideally if you can do betpen and fagsyl for 1950-1960/2000-2010 with all the output you've now been generating, that would be great!

cchambe12 commented 4 years ago

@lizzieinvancouver Sure thing!!

lizzieinvancouver commented 4 years ago

@cchambe12 Thank you!

cchambe12 commented 4 years ago

@lizzieinvancouver Those should be all set!

The new output is: analyses/pep_analyses/output/betpen_decsens_1950_1990.csv analyses/pep_analyses/output/fagsyl_decsens_1950_1990.csv

Let me know if you need anything else or it still looks funny.

lizzieinvancouver commented 4 years ago

@cchambe12 Thanks! Can you aggregate them by site and cc period and include these in the output: "matslope", "matslopelog", "meanmat", "varmat", "var(log(mat))", "varlo", "var(log(lo))", "meangdd", "meanmatlo", "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89."

Can you also provide that output for the 10 yr windows (1950-1960 and 2000-2010)? Let me know if you already have and I am just missing it.

cchambe12 commented 4 years ago

@lizzieinvancouver Ahh sorry I pointed you towards the wrong output file!!

analyses/pep_analyses/output/bpenestimates_withlog_1950_1990.csv analyses/pep_analyses/output/fsestimates_withlog_1950_1990.csv

And then I made new files with the 10 yr windows. analyses/pep_analyses/output/bpestimates_tenyrwindows.csv analyses/pep_analyses/output/fsestimates_tenyrwindows.csv

lizzieinvancouver commented 4 years ago

@cchambe12 Can you check these files? They all return 20 year windows in cc name and the output is identical across the files when I take their summaries....

> unique(fs.10yr$cc)
[1] "1950-1970" "1990-2010"
cchambe12 commented 4 years ago

@lizzieinvancouver Okay, I hope this is what you are looking for!!

I have: 10 year windows for both species, estimates and then means across the windows analyses/pep_analyses/output/bpenestimates_withlog_1950_2000.csv (all data) analyses/pep_analyses/output/bpestimates_tenyrwindows.csv (means across windows) analyses/pep_analyses/output/fsestimates_withlog_1950_2000.csv analyses/pep_analyses/output/fsestimates_tenyrwindows.csv

And also for the 20 year windows.. analyses/pep_analyses/output/bpenestimates_withlog_1950to2010.csv analyses/pep_analyses/output/bpestimates_twentyyrwindows.csv analyses/pep_analyses/output/fsestimates_withlog_1950to2010.csv analyses/pep_analyses/output/fsestimates_twentyyrwindows.csv

lizzieinvancouver commented 4 years ago

@cchambe12 Thank you! This has what I want (yay)! But something is still funky with the slope estimates. I contrasted what you just uploaded with what you uploaded previously ... and while everything else is identical, the slope estimates have changed -- and the more recent ones seem less likely to me. Do you know why these estimates have changed?

> df.10yr <- read.csv("pep_analyses/output/bpenestimates_withlog_1950_2000.csv", header=TRUE) # 10-year estimates (1950-1960 and 2000-2010) 
> df.old <- read.csv("pep_analyses/output/zarchive/bpenestimates_withlog.csv", header=TRUE)
> mean.betpen.10yr <- aggregate(df.10yr[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+     df.10yr["cc"], FUN=mean)
> 
> mean.betpen.10yr.old <- aggregate(df.old[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+     df.old["cc"], FUN=mean)
> mean.betpen.10yr
         cc  matslope matslopelog  meanmat    varmat     varlo  meangdd meanmatlo
1 1950-1960 -6.335281  -0.4251768 7.733054 1.5397806 110.51111 71.70517  7.041625
2 2000-2010 -3.216047  -0.2636081 8.829786 0.9102867  46.95728 64.62913  6.803512
> mean.betpen.10yr.old
         cc  matslope matslopelog  meanmat   varmat     varlo  meangdd meanmatlo
1 1950-1960 -4.282768  -0.1693573 5.602167 3.377819 110.51111 71.70517  7.041625
2 2000-2010 -3.611683  -0.2204338 6.578362 1.214370  46.95728 64.62913  6.803512
lizzieinvancouver commented 4 years ago

@cchambe12 Spoke too soon.... looks like the mat has changed and that may be driving the change in slopes ... but what you previously uploaded (mean.betpen.10yr.old) looks more similar to what we have in Ailene's paper and what makes sense to me based on the literature.

cchambe12 commented 4 years ago

@lizzieinvancouver Okay, so I went through the old code from OSPREE bbculdesac and remembered that we changed the MST months from March-April to March-May. We are now looking at March-May. The df.old was still using March-April as well. I'd be happy to change back if we want to stay consistent! I don't remember why we decided to change it to include May...

lizzieinvancouver commented 4 years ago

@cchambe12 Ah, thanks! I totally forgot too. I imagine we had a good reason for it but I can't recall it. Now that we're using the same windows as in the budburst paper I think best to use the same months too -- do you mind switching back March-April? Sorry for this -- yet another, another, another -- request!

cchambe12 commented 4 years ago

@lizzieinvancouver Easy fix!! Should be all set!

lizzieinvancouver commented 4 years ago

@cchambe12 Sorry I fell behind on this. Can you check the 20 year estimates were updated to the March-April temperatures? They still look high ...

> mean.betpen.20yr
         cc  matslope matslopelog  meanmat    varmat     varlo  meangdd meanmatlo
1 1950-1970 -6.053967  -0.4157587 7.910215 1.2324478  79.88591 77.76520  7.296462
2 1970-1990 -6.843966  -0.4895361 8.120306 0.8399633 104.82786 72.15105  7.164344
3 1990-2010 -2.115403  -0.1773680 9.037770 0.8897746  36.21610 62.17102  6.771629
  matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1         -8.045821        -4.0621117           -0.5577323          -0.27378504
2        -10.293221        -3.3947120           -0.7479982          -0.23107402
3         -4.456350         0.2255439           -0.3754453           0.02070938
> mean.betpen.10yr
         cc  matslope matslopelog  meanmat   varmat     varlo  meangdd meanmatlo
1 1950-1960 -4.282768  -0.1693573 5.602167 3.377819 110.51111 71.70517  7.041625
2 2000-2010 -3.611683  -0.2204338 6.578362 1.214370  46.95728 64.62913  6.803512
  matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1         -6.412267        -2.1532684           -0.2680584          -0.07065612
2         -6.540140        -0.6832266           -0.4035728          -0.03729481
> mean.fs.20yr
         cc  matslope matslopelog  meanmat    varmat    varlo  meangdd meanmatlo
1 1950-1970 -4.522218  -0.2801439 7.657062 1.2448377 63.35417 86.01584  7.613772
2 1970-1990 -2.347885  -0.1508822 7.858652 0.8656629 56.19254 81.31031  7.520632
3 1990-2010 -2.731600  -0.2066306 8.866196 0.8969811 32.82193 79.92338  7.468091
  matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1         -6.581286        -2.4631509           -0.4126363          -0.14765147
2         -5.250960         0.5551895           -0.3430452           0.04128080
3         -4.741830        -0.7213705           -0.3658382          -0.04742298
> mean.fs.10yr
         cc  matslope matslopelog  meanmat   varmat    varlo  meangdd meanmatlo
1 1950-1960 -2.846544  -0.1070899 5.622433 3.334813 71.89031 83.80658  7.541881
2 2000-2010 -3.432720  -0.1970442 6.661322 1.231192 38.28747 86.69336  7.747309
  matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1         -4.870520        -0.8225685           -0.1924748          -0.02170493
2         -6.049773        -0.8156660           -0.3566779          -0.03741055