lizzieinvancouver / ospree

Budbreak review paper database
3 stars 0 forks source link

update ospree data for KNB for limiting cues #438

Open lizzieinvancouver opened 2 years ago

lizzieinvancouver commented 2 years ago

We need to upload the version of OSPREE that goes with limiting cues. Our plan is to add a NEW csv file in addition to the one that is up on KNB. In limiting cues we have a couple main files that seem to run off output/ospree_clean.csv which is a larger file than what we used in the budburst ms.

@AileneKane Will make a smaller (i.e., fewer columns) version of output/ospree_clean.csv and @lizzieinvancouver will check that these files (below) run on that new version:

studydesignplots.R countinxns.R

@cchambe12 and @dbuona -- are there any other files you used to make figures for limiting cues that we should check?

After that, @AileneKane will post the new data file and edit the KNB entry a little, especially "Sampling Step 1" and maybe the methods to explain the two datafiles.

Thanks everyone!

AileneKane commented 2 years ago

@lizzieinvancouver I thought I had done this so easily by modifying the code in ospree_prep_for_knb.R in this commit but then realized that I had done it for ospree_clean_withchill_BB.csv, not ospree_clean.csv. :( This brought up a question for me- does it need to be ospree_clean.csv, (not ospree_clean_withchill.csv)? I thought that we needed chilling for limiting cues...

lizzieinvancouver commented 2 years ago

@AileneKane It is ospree_clean.csv in my code -- ospree_clean_withchill_BB.csv is much smaller than ospree_clean.csv and so loses a lot of studies. I tried using other formats and found it easiest to use the raw data and clean it some. It looks like ospree_clean_withchill.csv is similar enough in columns and length to ospree_clean.csv (see below and I scanned the column names) so we could try that if easier for you.

> d <- read.csv("~/Documents/git/projects/treegarden/budreview/ospree/analyses/output/ospree_clean.csv")
> bb <- read.csv("~/Documents/git/projects/treegarden/budreview/ospree/analyses/output/ospree_clean_withchill_BB.csv")
> dim(d)
[1] 12693    62
> dim(bb)
[1] 7643   88
> c <- read.csv("~/Documents/git/projects/treegarden/budreview/ospree/analyses/output/ospree_clean_withchill.csv")
> dim(c)
[1] 12693    81
AileneKane commented 2 years ago

@lizzieinvancouver ospree_clean_withchill.csv is much easier. the only columns that are a bit challenging to add are "force_type", "photo_type" do we want to include these? If so, I'll dig out the code from the original bb cleaning files and add them.

follow up: i'll just assume you want them and will include them.

AileneKane commented 2 years ago

@lizzieinvancouver I have added this draft file (with photo_type and force_type as "NAS" for now- i can add them if we need these columns for limiting cues- it will be a bit challenging i think for none bb responses? But maybe not...) You can fine the csv in docs/limitingcues Let me know if you'd like me to change anything, move the file elsewhere, or if you want to talk anything through!

AileneKane commented 2 years ago

@lizzieinvancouver I have another follow-up question: I used the as.numeric() code to convert chilling, forcing, etc to numeric so that any non-numeric entries show up as NAs- let me know if we do not want to do this!

lizzieinvancouver commented 2 years ago

@AileneKane Thanks for working on this! Unfortunately I checked today and I cannot use ospree_clean_withchill.csv. It gets different answers along the way (for example, fewer studies in Africa and when I clean dates in a specific way I lose 1000 rows with ospree_clean_withchill.csv whereas with ospree_clean.csv I lose only 200) so I don't feel comfortable using it.

Further, my code does not work with the below code your file has:

  ## make a bunch of things numeric (taken from bbdataplease_knb.R)
  d$forceday <- as.numeric(d$forcetemp)
  d$forcenight <- as.numeric(d$forcetemp_night)
  d$photonight <- as.numeric(d$photoperiod_night)

  d$photo <- as.numeric(d$photoperiod_day)
  d$force <- d$forceday
  d$force[is.na(d$forcenight)==FALSE & is.na(d$photo)==FALSE &
                is.na(d$photonight)==FALSE] <-
  (d$forceday[is.na(d$forcenight)==FALSE & is.na(d$photo)==FALSE &
                      is.na(d$photonight)==FALSE]*
    d$photo[is.na(d$forcenight)==FALSE & is.na(d$photo)==FALSE &
                     is.na(d$photonight)==FALSE] +
     d$forcenight[is.na(d$forcenight)==FALSE & is.na(d$photo)==FALSE &
                          is.na(d$photonight)==FALSE]*
     d$photonight[is.na(d$forcenight)==FALSE & is.na(d$photo)==FALSE &
                          is.na(d$photonight)==FALSE])/24

    d$chill <- as.numeric(d$Total_Utah_Model) 
    d$chill.hrs <- as.numeric(d$Total_Chilling_Hours) 
    d$chill.ports <- as.numeric(d$Total_Chill_portions) 

    d$resp <- as.numeric(d$response.time)

I looked through my main code (which is analyses/limitingcues/countintxns/countintxns.R, which uses analyses/limitingcues/countintxns/source/countintxns_cleanosp.R) and there's no easy way to add these. I did a lot of crazy manipulations to not lose studies (which is easy to do when dealing with dates, chilling temperatures, NA values etc.).

I edited your R script and checked the output (ospree_forknb_limcue.csv) gives me the same answers as ospree_clean.csv (also checked the heatmap comes out the same in studydesignplots.R). Is there a way we could submit this file to KNB?

lizzieinvancouver commented 2 years ago

@AileneKane Just checking in on my question above:

I edited your R script and checked the output (ospree_forknb_limcue.csv) gives me the same answers as ospree_clean.csv (also checked the heatmap comes out the same in studydesignplots.R). Is there a way we could submit this file to KNB?

AileneKane commented 2 years ago

@lizzieinvancouver We can submit your new ospree_forknb_limcue.csv file to knb for sure. I've got a question about this. Do you want me to : A) Add the new database file (as an alternate version of OSPREE, so that there are two versions- one for the main bb ms and one for limcues), or B) Replace the previous version of ospree that's on knb now with this one, and add code to get this new file to be the same as what's there now (from the bb ms)?

B will obviously take more time and be harder, so I may not get to it till next week.

lizzieinvancouver commented 2 years ago

@AileneKane Definitely A! I think that is easier for us and users and what I was thinking. If we're clear about what the two files are then I don't see a real drawback.

AileneKane commented 2 years ago

@lizzieinvancouver Ok, as I started to do this I realized that I wasn't sure if there is code that I should also add for limiting cues? Our original database entry has the model code (zscoring code, etc, too) but perhaps there is nothing to add in that regard since we didn't fit any models for this paper? I've know added it to knb. Please let me know if there is anything we want to add code-wise. Also, I noticed that "Entered by"" is in this version of the database- shall I remove it for the file on knb?

AileneKane commented 2 years ago

@lizzieinvancouver as you probable know, there are other columns that are in this version but not in the bb subset:

AileneKane commented 2 years ago

doi is doi:10.5063/F1DF6PNQ.

lizzieinvancouver commented 2 years ago

Please let me know if there is anything we want to add code-wise. Also, I noticed that "Entered by"" is in this version of the database- shall I remove it for the file on knb?

@AileneKane Thanks for working on this! I don't think we need to add anything code-wise; we don't do any amazing models or such. I think we should include "Entered by" in the data for KNB, you can't really identify people from it (well, not exactly) and someone might want to model the effect of who entered it or such.

I would like to INCLUDE all the columns I sent, as these are ones we will include when posting the updated data someday. They're all columns we included when entering the data so may come in handy for someone else someday.