Closed lizzieinvancouver closed 4 years ago
@lizzieinvancouver I forgot we can't take the log of negative numbers. I tried subsetting the dataset to start March 1st but I was still getting freezing temperatures so I converted everything to Kelvin and then compared to what we had without the log...
estprecc.log <- lm(log(lo)~log(tempk), data=subset(bptempandbb, cc=="1950-1960"))
estpostcc.log <- lm(log(lo)~log(tempk), data=subset(bptempandbb, cc=="2000-2010"))
logdiffbefore.after <- coef(estprecc.log)[2]-coef(estpostcc.log)[2]
negative means a decline in sensitivity AFTER climate change logdiffbefore.after = -0.33, exponentiated we get 0.718 - so an increase in sensitivity!
estprecc <- lm(lo~tempk, data=subset(bptempandbb, cc=="1950-1960"))
estpostcc <- lm(lo~tempk, data=subset(bptempandbb, cc=="2000-2010"))
diffbefore.after <- coef(estprecc)[2]-coef(estpostcc)[2]
diffbefore.after = -0.1529792
@cchambe12 find credible intervals next
@cchambe12 I totally forgot about that too! Do we actually have years with negative mean temperatures though?
@lizzieinvancouver No! I was using the wrong column. All is well and we don't need to use Kelvin after all, which is a relief. I've made output for the PEP stuff and for sliding window.
decsens/analyses/pep_analyses/output/bpestimates_withlog.csv
I will make a README.txt with column descriptions now.
@cchambe12 This is great! Thank you so much for doing this! I am working to update the manuscript. I have a couple more requests (sorry) ... could we add a decade (1990-2000?) and could we add Fagus?
@lizzieinvancouver I was thinking of adding Fagus too! And yeah, no problem with the added decade. Would you like the same for the SWA or wait to see how this goes first?
@cchambe12 Great on Fagus and another decade! Let's start there and add SWA later if needed. Thank you!
@lizzieinvancouver All set!! I added the new data to the output folder
FAGSYL: fsylestimates_withlog_1950_1990_2000.csv BETPEN: bpenestimates_withlog_1950_1990_2000.csv
@cchambe12 Wow! This was so fast. I just updated the plotting code -- thanks. One thing that is a little odd to me is that the warming is about the same (90s equals 2000s, but this might make sense, in which case -- with apologies I might ask that we add another decade) ... to discuss tomorrow.
@lizzieinvancouver Yeah no problem at all! I was wondering if adding more species would be helpful too. Anything from PEP is easy to add!
@cchambe12 Thanks! I think we will try some other years -- I was just looking at German climate for when the warming trend began there (see the German Climate Atlas and it looks like maybe the 70s ... so that's something to consider. Also, if we should go to a bigger window for safety (e.g., 20 is more standard for 'climate'). Talk to you soon!
@cchambe12 Try with March to May temperatures and 20 year windows .... confidence intervals ... 1990-2010 versus 1950-1970.
@cchambe12 Sorry to just think of this now, but if we're doing 1950-1970 and 1990-2010, then let's add 1970-1990 (and I really assume that mean temp should fall between the two). As for confint, I suggest you pick what sounds good to you ... 84, 90, 80 ... whatever other than 95 or something more extreme. Thank you again for all your help on this!
@lizzieinvancouver I just pushed the real data output for 1950-1970 and 1990-2010 with 89 for the confint. I will add the new timeframe now!
I think this is done!
@cchambe12 This is awesome! Check out the means ... but I am not sure how to interpret everything.... email to follow. Also @AileneKane @dbuona @MoralesCastilla
> mean.betpen <- aggregate(df[c("matslope", "matslopelog", "meanmat", "varmat", "meangdd",
+ "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89")],
+ df["cc"], FUN=mean)
>
> tempdiff1 <- mean.betpen$meanmat[which(mean.betpen$cc=="1970-1990")]-
+ mean.betpen$meanmat[which(mean.betpen$cc=="1950-1970")]
> tempdiff2 <- mean.betpen$meanmat[which(mean.betpen$cc=="1990-2010")]-
+ mean.betpen$meanmat[which(mean.betpen$cc=="1950-1970")]
> mean.betpen
cc matslope matslopelog meanmat varmat meangdd matslopeconfint11
1 1950-1970 -6.053967 -0.4157587 7.910215 1.2324478 72.47753 -8.045821
2 1970-1990 -6.843966 -0.4895361 8.120306 0.8399633 72.15105 -10.293221
3 1990-2010 -2.115403 -0.1773680 9.037770 0.8897746 60.03246 -4.456350
matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -4.0621117 -0.5577323 -0.27378504
2 -3.3947120 -0.7479982 -0.23107402
3 0.2255439 -0.3754453 0.02070938
> mean.fs <- aggregate(fs[c("matslope", "matslopelog", "meanmat", "varmat", "meangdd",
+ "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89")],
+ fs["cc"], FUN=mean)
> x <- c(1:length(clim))
> mean.fs
cc matslope matslopelog meanmat varmat meangdd matslopeconfint11
1 1950-1970 -4.522218 -0.2801439 7.657062 1.2448377 86.01584 -6.581286
2 1970-1990 -2.347885 -0.1508822 7.858652 0.8656629 81.31031 -5.250960
3 1990-2010 -2.731600 -0.2066306 8.866196 0.8969811 79.92338 -4.741830
matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -2.4631509 -0.4126363 -0.14765147
2 0.5551895 -0.3430452 0.04128080
3 -0.7213705 -0.3658382 -0.04742298
Wait, let's just do it here @cchambe12 @AileneKane @dbuona @MoralesCastilla
Thanks to Cat we already have updated results from our conversation Monday! The plot is here that the logged results are on every plot, the right-side is a zoom). The dots are means, the lines are 89% confidence intervals (or 78% ... they may be 11% to 89%). I used exp() to convert the logged coefs, but I am not sure that's right. Here's what's clear:
(1) The 20 year windows are WAY better. The climate is more stable across species (phew).
(2) There are some similarities in trends with the raw and logged regressions, but I cannot see through them.
(3) Betula is doing something more exciting than FagSyl. It may be actually more sensitive recently ... which lines up with the GDD findings.
(4) Variance in MAT does go down, but not when the sensitivity does.
Here's what I need help with:
(a) How do we interpret the logged coefs? Should we back-convert them? If so, how? Can they ever be negative? I think not mathematically, which seems wrong ... https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/
This is sort of critical before we can figure what to say.
@cchambe12 Can you confirm the intervals and also what window the GDD is calculated over (some months or until leafout for each species x site x year)? Thanks!
@lizzieinvancouver Yes sorry it meant 78% credible intervals! "GDD" is 30 days before leafout for each species x year x site "mat.lo" is mean temperature 30 days before leafout "mat" is mean spring temperature from March 1 to May 31
I'd be happy to adjust as necessary!
As for back-converting... I also used exp(). The back-converted values should never be negative as you cannot log a negative number but the logged outcomes can be negative.
@cchambe12 No, I think this is great. It's just interesting to me that they are requiring less GDD, which is sort of more sensitive ... I think.
@AileneKane My main questions are:
@lizzieinvancouver
I hadn't realized this in our early discussion of these models, but there is not a great way to back-convert logged response variables with continuous predictors (especially when those predictors are also logged). It is not a simple "exp()" unfortunately. See http://www.biostat.jhsph.edu/~iruczins/teaching/jf/ch8.pdf
I'm not sure yet what we can say about changes in sensitivity. I'm not sure we want to use the log-log approach, if our goal is to compare slopes on a natural scale...
@AileneKane @cchambe12 I just confirmed (see figure here) that we need a log-log. So I will probably go forward without back-converting.
@cchambe12 The variance in leafout for betpen goes way down, can you double-check the code and that we are not dropping some crazy outlier site across the 20 year windows?
> mean.betpen
cc matslope matslopelog meanmat varmat varlo meangdd matslopeconfint11
1 1950-1970 -6.053967 -0.4157587 7.910215 1.2324478 79.88591 72.47753 -8.045821
2 1970-1990 -6.843966 -0.4895361 8.120306 0.8399633 104.82786 72.15105 -10.293221
3 1990-2010 -2.115403 -0.1773680 9.037770 0.8897746 36.21610 60.03246 -4.456350
> mean.fs
cc matslope matslopelog meanmat varmat varlo meangdd matslopeconfint11
1 1950-1970 -4.522218 -0.2801439 7.657062 1.2448377 63.35417 86.01584 -6.581286
2 1970-1990 -2.347885 -0.1508822 7.858652 0.8656629 56.19254 81.31031 -5.250960
3 1990-2010 -2.731600 -0.2066306 8.866196 0.8969811 32.82193 79.92338 -4.741830
@lizzieinvancouver I tried modifying the site selection and rerunning the code but it returned the same answers... it does seem strange. I will continue to look into this but at first glance there isn't an obvious issue with the code or site selection
@cchambe12 It could be real. Perhaps you could make the type of ggplots you made for the budburst ms? Then we could get a look at the data better.
@cchambe12 I think something is up with our estimates (either our old or new ones)... I was just looking at the estimates we had in mid-January....
Below code from pepplotting.R
> df.old <- read.csv("pep_analyses/output/bpenestimates_withlog.csv", header=TRUE) # earlier 10-year estimates
>
> mean.betpen.old <- aggregate(df.old[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+ df.old["cc"], FUN=mean)
>
> mean.betpen.old
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1960 -4.282768 -0.1693573 5.602167 3.377819 110.51111 71.70517 7.041625
2 2000-2010 -3.611683 -0.2204338 6.578362 1.214370 46.95728 64.62913 6.803512
And then I calculated new estimates (below code is from betpen_tempsensitivities.R):
> meanhere.10yr[which(names(meanhere.10yr) %in% c("cc", "matslope","matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo"))]
cc meanmat varmat meanmatlo varlo meangdd matslope matslopelog
1 1950-1960 8.027834 1.5063816 6.992420 103.14902 68.65143 -6.511172 -0.4549991
2 1960-1970 7.792597 1.0464629 7.183117 63.74706 76.30362 -5.575453 -0.3692203
3 1970-1980 7.924367 0.2715488 6.835710 92.05621 65.84752 -11.876632 -0.8398790
4 1980-1990 8.316245 1.4005000 7.492977 111.57255 78.45458 -5.907674 -0.4203201
5 1990-2000 9.015313 0.9692797 6.493816 37.64641 55.25233 -1.362697 -0.1167921
6 2000-2010 9.060227 0.8985988 6.855772 36.34706 64.81260 -2.915261 -0.2457476
@cchambe12 I'll keep checking the code in betpen_tempsensitivities.R that I wrote and see if there are errors there, but if you have any ideas, let me know.
@lizzieinvancouver I'm not surprised they are different. For back in January, I selected sites that only needed to have observations for those 20 years (1950-1960 & 2000-2010). For this declining sensitivities, I am selecting sites that have observations for 60 years (1950-2010). So the number of sites goes down significantly (from 48 to 17). I've streamlined the code to be liberal with sites, I could rerun the code to have different sites for each 20 year period but then I feel like we can't compare results...?
@cchambe12 Ah -- that makes perfect sense! I should have thought of this. But we should think more about this -- we wanted to switch to 20 year windows to have better confidence in our results, but if we are trading it off with fewer sites I am not sure that's the best approach.
@cchambe12 Let me think on this some ... if we really want inference on what is going on with PEP we should probably switch to some sort thoughtful hierarchical model where we can leverage more sites and more years. But! I think that is well outside the scope of this paper, where the PEP results are snippet at most. So, let me think on it. My sense is that more sites is better than more years ... both because I expect really high site-to-site variability (observer differences, location etc.) and because 17 x 20=340 observations while 48 x 10=480.
@lizzieinvancouver Ahh yes you're right, a hierarchical model is the usual answer here...
We are losing a lot of data here by going down to only 17 sites. I'll play around with using the 10 year windows again and then just looking at 1950-1970 and 1990-2010 rather than all three time windows. Maybe one of those options will give us back enough sites but still be informative?
@cchambe12 I honestly feel like those 40 years will probably be fairly similar to the 60 year problem, but have a stab if you have time!
@lizzieinvancouver You were right!
Here's the breakdown if it's helpful: BETPEN: 10-year timeframes x 2 periods -> 48 sites total (1950-1960 & 2000-2010) 20-yr x 3 periods -> 17 sites (1950-1970 & 1970-1990 & 1990-2010) 20-yr x 2 -> 26 sites (1950-1970 & 1990-2010) 10-yr x 3 -> 32 sites (1950-1960 & 1970-1980 & 2000-2010)
FAGSYL: 10-yr x 2 -> 47 sites 20-yr x 3 -> 24 sites 20-yr x 2 -> 29 sites 10-yr x 3 -> 37 sites
@cchambe12 Super helpful! Thanks! I think we should try to present a mix of results if we can do it without confusing folks and I think we have most of what we need. But I have one more request (I know, I know, always 'one more'): could you estimate the per-site variance for both lo and mat after logging the lo and mat? That means in addition to providing var(mat) and var(lo), could you also provide (var(log(mat)) and var(log(lot)) for each site x time period? I plan just to use the 20 year windows and the 1950-1960 and 2000-2010 year windows.
@lizzieinvancouver Great! Haha these are easy requests! I'll rerun it to make sure to get all possible sites and add these columns. It should be done running by tonight or tomorrow morning at the latest.
@cchambe12 Wait! One more tweak -- can we make sure all the files have the 11 and 89 confint? Thank you again! Ideally if you can do betpen and fagsyl for 1950-1960/2000-2010 with all the output you've now been generating, that would be great!
@lizzieinvancouver Sure thing!!
@cchambe12 Thank you!
@lizzieinvancouver Those should be all set!
The new output is: analyses/pep_analyses/output/betpen_decsens_1950_1990.csv analyses/pep_analyses/output/fagsyl_decsens_1950_1990.csv
Let me know if you need anything else or it still looks funny.
@cchambe12 Thanks! Can you aggregate them by site and cc period and include these in the output: "matslope", "matslopelog", "meanmat", "varmat", "var(log(mat))", "varlo", "var(log(lo))", "meangdd", "meanmatlo", "matslopeconfint11", "matslopeconfint89", "matslopelogconfint11", "matslopelogconfint89."
Can you also provide that output for the 10 yr windows (1950-1960 and 2000-2010)? Let me know if you already have and I am just missing it.
@lizzieinvancouver Ahh sorry I pointed you towards the wrong output file!!
analyses/pep_analyses/output/bpenestimates_withlog_1950_1990.csv analyses/pep_analyses/output/fsestimates_withlog_1950_1990.csv
And then I made new files with the 10 yr windows. analyses/pep_analyses/output/bpestimates_tenyrwindows.csv analyses/pep_analyses/output/fsestimates_tenyrwindows.csv
@cchambe12 Can you check these files? They all return 20 year windows in cc name and the output is identical across the files when I take their summaries....
> unique(fs.10yr$cc)
[1] "1950-1970" "1990-2010"
@lizzieinvancouver Okay, I hope this is what you are looking for!!
I have: 10 year windows for both species, estimates and then means across the windows analyses/pep_analyses/output/bpenestimates_withlog_1950_2000.csv (all data) analyses/pep_analyses/output/bpestimates_tenyrwindows.csv (means across windows) analyses/pep_analyses/output/fsestimates_withlog_1950_2000.csv analyses/pep_analyses/output/fsestimates_tenyrwindows.csv
And also for the 20 year windows.. analyses/pep_analyses/output/bpenestimates_withlog_1950to2010.csv analyses/pep_analyses/output/bpestimates_twentyyrwindows.csv analyses/pep_analyses/output/fsestimates_withlog_1950to2010.csv analyses/pep_analyses/output/fsestimates_twentyyrwindows.csv
@cchambe12 Thank you! This has what I want (yay)! But something is still funky with the slope estimates. I contrasted what you just uploaded with what you uploaded previously ... and while everything else is identical, the slope estimates have changed -- and the more recent ones seem less likely to me. Do you know why these estimates have changed?
> df.10yr <- read.csv("pep_analyses/output/bpenestimates_withlog_1950_2000.csv", header=TRUE) # 10-year estimates (1950-1960 and 2000-2010)
> df.old <- read.csv("pep_analyses/output/zarchive/bpenestimates_withlog.csv", header=TRUE)
> mean.betpen.10yr <- aggregate(df.10yr[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+ df.10yr["cc"], FUN=mean)
>
> mean.betpen.10yr.old <- aggregate(df.old[c("matslope", "matslopelog", "meanmat", "varmat", "varlo", "meangdd", "meanmatlo")],
+ df.old["cc"], FUN=mean)
> mean.betpen.10yr
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1960 -6.335281 -0.4251768 7.733054 1.5397806 110.51111 71.70517 7.041625
2 2000-2010 -3.216047 -0.2636081 8.829786 0.9102867 46.95728 64.62913 6.803512
> mean.betpen.10yr.old
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1960 -4.282768 -0.1693573 5.602167 3.377819 110.51111 71.70517 7.041625
2 2000-2010 -3.611683 -0.2204338 6.578362 1.214370 46.95728 64.62913 6.803512
@cchambe12 Spoke too soon.... looks like the mat has changed and that may be driving the change in slopes ... but what you previously uploaded (mean.betpen.10yr.old) looks more similar to what we have in Ailene's paper and what makes sense to me based on the literature.
@lizzieinvancouver Okay, so I went through the old code from OSPREE bbculdesac and remembered that we changed the MST months from March-April to March-May. We are now looking at March-May. The df.old was still using March-April as well. I'd be happy to change back if we want to stay consistent! I don't remember why we decided to change it to include May...
@cchambe12 Ah, thanks! I totally forgot too. I imagine we had a good reason for it but I can't recall it. Now that we're using the same windows as in the budburst paper I think best to use the same months too -- do you mind switching back March-April? Sorry for this -- yet another, another, another -- request!
@lizzieinvancouver Easy fix!! Should be all set!
@cchambe12 Sorry I fell behind on this. Can you check the 20 year estimates were updated to the March-April temperatures? They still look high ...
> mean.betpen.20yr
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1970 -6.053967 -0.4157587 7.910215 1.2324478 79.88591 77.76520 7.296462
2 1970-1990 -6.843966 -0.4895361 8.120306 0.8399633 104.82786 72.15105 7.164344
3 1990-2010 -2.115403 -0.1773680 9.037770 0.8897746 36.21610 62.17102 6.771629
matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -8.045821 -4.0621117 -0.5577323 -0.27378504
2 -10.293221 -3.3947120 -0.7479982 -0.23107402
3 -4.456350 0.2255439 -0.3754453 0.02070938
> mean.betpen.10yr
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1960 -4.282768 -0.1693573 5.602167 3.377819 110.51111 71.70517 7.041625
2 2000-2010 -3.611683 -0.2204338 6.578362 1.214370 46.95728 64.62913 6.803512
matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -6.412267 -2.1532684 -0.2680584 -0.07065612
2 -6.540140 -0.6832266 -0.4035728 -0.03729481
> mean.fs.20yr
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1970 -4.522218 -0.2801439 7.657062 1.2448377 63.35417 86.01584 7.613772
2 1970-1990 -2.347885 -0.1508822 7.858652 0.8656629 56.19254 81.31031 7.520632
3 1990-2010 -2.731600 -0.2066306 8.866196 0.8969811 32.82193 79.92338 7.468091
matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -6.581286 -2.4631509 -0.4126363 -0.14765147
2 -5.250960 0.5551895 -0.3430452 0.04128080
3 -4.741830 -0.7213705 -0.3658382 -0.04742298
> mean.fs.10yr
cc matslope matslopelog meanmat varmat varlo meangdd meanmatlo
1 1950-1960 -2.846544 -0.1070899 5.622433 3.334813 71.89031 83.80658 7.541881
2 2000-2010 -3.432720 -0.1970442 6.661322 1.231192 38.28747 86.69336 7.747309
matslopeconfint11 matslopeconfint89 matslopelogconfint11 matslopelogconfint89
1 -4.870520 -0.8225685 -0.1924748 -0.02170493
2 -6.049773 -0.8156660 -0.3566779 -0.03741055
Re-do the PEP work on BETPEN using a log-log regression (i.e., instead of days ~ C, do log(days) ~ log(C)).