robwschlegel / heatwaveR

This GitHub repo contains all of the code for the heatwaveR package.
https://robwschlegel.github.io/heatwaveR/
Other
45 stars 16 forks source link

Help with exceedance() function #33

Closed hichew22 closed 8 months ago

hichew22 commented 10 months ago

Hello,

I was looking for a function that could help me with data wrangling and I believe the exceedance() function is exactly what I'm looking for. I have a dataframe with person IDs, date of treatment, day since treatment, and a lab value. Essentially, I am hoping to find the longest streak of consecutive days where the lab value is below a certain threshold, and extending the streak if the lab value does not rise above the threshold for >3 days.

I wrangled my dataframe into a dataframe with the IDs, date, and lab value as follows:

df <- df %>% 
  group_by(id) %>% 
  mutate(date = as.Date(treatment_date + time_since_treatment)) %>% 
  ungroup()

Now, I am hoping to use the exceedance() function, setting threshold as 500 as follows:

list_exceedance_500 <- heatwaveR::exceedance(
  data = df,
  x = date,
  y = lab,
  threshold = 500,
  below = TRUE,
  minDuration = 1,
  joinAcrossGaps = TRUE,
  maxGap = 3
)

I was wondering if you can help me with 2 questions: 1) When I run the above code, I get the error "Error in seq.int(0, to0 - from, by) : 'to' must be a finite number." When I use RmarineHeatWaves::exceedance as follows, I do not get the same error.

list_exceedance_500 <- heatwaveR::exceedance(
  data = df,
  x = date,
  y = lab,
  threshold = 500,
  below = TRUE,
  min_duration = 1,
  join_across_gaps = TRUE,
  max_gap = 3
)

How can I fix this error in heatwaveR?

2) Is there a way to apply exceedance in a way where I can find the maximum duration for each patient ID, for example combining exceedance() with group_by(id)?

Thank you!

hichew22 commented 10 months ago

Also, is there a way to address the error where if all values are above my threshold, it returns an exceedance of 0 rather than a message like "The given threshold value of 500 is less than the minimum temperature of 680 present in this time series." ? Thank you!

ajsmit commented 10 months ago

Good day,

An interesting application of our code. I will have a look at it today and come back to you.

Regards, AJ

On Wed, 22 Nov 2023 at 06:59, hichew22 @.***> wrote:

Also, is there a way to address the error where if all values are above my threshold, it returns an exceedance of 0 rather than a message like "The given threshold value of 500 is less than the minimum temperature of 680 present in this time series." ? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822117441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAPYSBHNAOZKRXBNV3TYFWBEZAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGEYTONBUGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago

Can you please tell me which version of heatwaveR you are using?

Thanks, AJ

On Wed, 22 Nov 2023 at 07:11, Albertus Smit @.***> wrote:

Good day,

An interesting application of our code. I will have a look at it today and come back to you.

Regards, AJ

On Wed, 22 Nov 2023 at 06:59, hichew22 @.***> wrote:

Also, is there a way to address the error where if all values are above my threshold, it returns an exceedance of 0 rather than a message like "The given threshold value of 500 is less than the minimum temperature of 680 present in this time series." ? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822117441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAPYSBHNAOZKRXBNV3TYFWBEZAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGEYTONBUGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago

Also, tell me about the time interval between in the 'Date' variable. Daily?

AJ

On Wed, 22 Nov 2023 at 07:15, Albertus Smit @.***> wrote:

Can you please tell me which version of heatwaveR you are using?

Thanks, AJ

On Wed, 22 Nov 2023 at 07:11, Albertus Smit @.***> wrote:

Good day,

An interesting application of our code. I will have a look at it today and come back to you.

Regards, AJ

On Wed, 22 Nov 2023 at 06:59, hichew22 @.***> wrote:

Also, is there a way to address the error where if all values are above my threshold, it returns an exceedance of 0 rather than a message like "The given threshold value of 500 is less than the minimum temperature of 680 present in this time series." ? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822117441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAPYSBHNAOZKRXBNV3TYFWBEZAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGEYTONBUGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Hi AJ, thank you for your prompt response! I think I have figured it out... I just need your help to confirm and possibly for the last step.

  1. I have a dataframe with subject IDs, dates (daily, i.e., 2022-06-01, 2022-06-02, 2022-06-03, and so forth), and lab values ('lab'). Each set of dates may be different for each subject ID but they are always sequential and daily.

  2. I created a function that will run the exceedance() function and extract the maximum duration of exceedance where lab is below a threshold (i.e., find the "longest streak" where the labi s below the threshold). If the exceedance tibble is blank (i.e., all values for a subject ID are above the threshold), then return the duration as 0:

    count_consecutive_days_below_threshold <-
    function(data, threshold, min_duration, max_gap) {
    # Run exceedance() function
    list_exceedance <- heatwaveR::exceedance(
      data = data,
      x = date,
      y = ancx,
      threshold = threshold,
      below = TRUE,
      minDuration = min_duration,
      joinAcrossGaps = TRUE,
      maxGap = max_gap
    )
    
    # Extract exceedance tibble
    exceedance <- list_exceedance$exceedance
    
    # Extract maximum duration of exceedance ("longest streak")
    # If exceedance tibble is blank (i.e., all values above threshold), then return duration as 0
    if (all(is.na(exceedance))) {
      consecutive_days_below_threshold <- 0
    } else{
      consecutive_days_below_threshold <- max(exceedance$duration)
    }
    
    return(consecutive_days_below_threshold)
    }
  3. I then was able to apply this function to my dataframe to create a new column that contains the 'consecutive_days_below_threshold' per subject ID:

new_df <- df %>% 
  group_by(id) %>% 
  do(data.frame(., streak_length = count_consecutive_days_below_threshold(., threshold = 500, min_duration = 0, max_gap = 5)))
  1. Now, I want to transform my new_df (which is a long dataframe) where I just have 1 row per subject ID and the corresponding streak_length as follows:
    short_df <- new_df %>%
    group_by(id) %>%
    summarize(
    streak_length = max(streak_length)
    )

Does this seem right to you? Is there another easier way to do this? Thank you!

ajsmit commented 10 months ago

Hi

Would you be able to send me a dataframe with your data (or some simulated data to replace your measurements in the 'lab' variable?

Thank you. AJS

On Wed, 22 Nov 2023 at 07:25, hichew22 @.***> wrote:

Hi AJ, thank you for your prompt response! I think I have figured it out... I just need your help to confirm and possibly for the last step.

  1. I have a dataframe with subject IDs, dates (daily, i.e., 2022-06-01, 2022-06-02, 2022-06-03, and so forth), and lab values ('lab'). Each set of dates may be different for each subject ID but they are always sequential and daily.
  2. I created a function that will run the exceedance() function and extract the maximum duration of exceedance where lab is below a threshold (i.e., find the "longest streak" where the labi s below the threshold). If the exceedance tibble is blank (i.e., all values for a subject ID are above the threshold), then return the duration as 0:

count_consecutive_days_below_threshold <- function(data, threshold, min_duration, max_gap) {

Run exceedance() function

list_exceedance <- heatwaveR::exceedance(
  data = data,
  x = date,
  y = ancx,
  threshold = threshold,
  below = TRUE,
  minDuration = min_duration,
  joinAcrossGaps = TRUE,
  maxGap = max_gap
)

# Extract exceedance tibble
exceedance <- list_exceedance$exceedance

# Extract maximum duration of exceedance ("longest streak")
# If exceedance tibble is blank (i.e., all values above threshold), then return duration as 0
if (all(is.na(exceedance))) {
  consecutive_days_below_threshold <- 0
} else{
  consecutive_days_below_threshold <- max(exceedance$duration)
}

return(consecutive_days_below_threshold)

}

  1. I then was able to apply this function to my dataframe to create a new column that contains the 'consecutive_days_below_threshold' per subject ID:

new_df <- df %>% group_by(id) %>% do(data.frame(., streak_length = count_consecutive_days_below_threshold(., threshold = 500, min_duration = 0, max_gap = 5)))

  1. Now, I want to transform my new_df (which is a long dataframe) where I just have 1 row per subject ID and the corresponding streak_length as follows:

short_df <- new_df %>% group_by(id) %>% summarize( streak_length = max(streak_length) )

Does this seem right to you? Is there another easier way to do this? Thank you!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822134975, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSANTQX3JVEWPWSJ7XPDYFWECZAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGEZTIOJXGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Yes, something like this:

# Set seed for reproducibility
set.seed(123)

# Number of subjects
num_subjects <- 5

# Number of days
num_days <- 10

# Create subject IDs
subject_ids <- rep(1:num_subjects, each = num_days)

# Create consecutive daily dates for each subject
dates <- rep(seq(as.Date("2023-01-01"), by = "1 day", length.out = num_days), times = num_subjects)

# Create lab values between 0 and 3000
lab_values <- runif(num_subjects * num_days, min = 0, max = 3000)

# Create the dataframe
df <- data.frame(
  SubjectID = factor(subject_ids),
  Date = as.Date(dates),
  Lab = lab_values
)
ajsmit commented 10 months ago

Hi,

Tell me if this works for you. Attached.

AJ

On Wed, 22 Nov 2023 at 07:53, hichew22 @.***> wrote:

Yes, something like this:``` Set seed for reproducibility

set.seed(123) Number of subjects

num_subjects <- 5 Number of days

num_days <- 10 Create subject IDs

subject_ids <- rep(1:num_subjects, each = num_days) Create consecutive daily dates for each subject

dates <- rep(seq(as.Date("2023-01-01"), by = "1 day", length.out = num_days), times = num_subjects) Create lab values between 0 and 3000

lab_values <- runif(num_subjects * num_days, min = 0, max = 3000) Create the dataframe

df <- data.frame( SubjectID = factor(subject_ids), Date = as.Date(dates), Lab = lab_values )

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822156655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAIAPCW4EINHKVPJ2JLYFWHOZAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGE2TMNRVGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Hello AJ, I didn’t get an attachment. Can you try again?

ajsmit commented 10 months ago

There should be a .html file. See here.

On Wed, 22 Nov 2023 at 09:05, hichew22 @.***> wrote:

Hello AJ, I didn’t get an attachment. Can you try again?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822218525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAKJWS5ZPTNVBQV6V6LYFWP2NAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIYTQNJSGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago

Try this one instead. I realised I did not include formatting.

On Wed, 22 Nov 2023 at 09:05, Albertus Smit @.***> wrote:

There should be a .html file. See here.

On Wed, 22 Nov 2023 at 09:05, hichew22 @.***> wrote:

Hello AJ, I didn’t get an attachment. Can you try again?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822218525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAKJWS5ZPTNVBQV6V6LYFWP2NAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIYTQNJSGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Hi AJ, I am still not getting the attachment on my end (perhaps my email client is blocking it). Would you be able to post the code here?

ajsmit commented 10 months ago

Here is a PDF and the code is here too in a Quarto document. Open the Quarto doc in RStudio.

On Wed, 22 Nov 2023 at 09:09, hichew22 @.***> wrote:

Hi AJ, I am still not getting the attachment on my end (perhaps my email client is blocking it). Would you be able to post the code here?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822222530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAKHOVKZL2DB7F6SFXTYFWQKRAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIZDENJTGA . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

It still does not show up in my email. Very odd! Could you post it here on GitHub?

ajsmit commented 10 months ago

Have you checked spam? Or send me your direct email address?

Rob will be able to post it on GitHub and the software's website. In the next 30 minutes I'll load it onto my website where you'll be able to see it.

On Wed, 22 Nov 2023 at 09:15, hichew22 @.***> wrote:

It still does not show up in my email. Very odd! Could you post it here on GitHub?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822228053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAJE3PSTTWCAEQT4WVDYFWQ7LAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIZDQMBVGM . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago
#| warnings: false
#| message: false
library(heatwaveR)
library(plyr) # because I like plyr
library(dplyr)

Prepare a dataframe with the IDs, date, and lab values

num_subjects <- 5
num_days <- 10
subject_ids <- rep(1:num_subjects, each = num_days)
dates <- rep(seq(as.Date("2023-01-01"), by = "1 day", length.out =
num_days),
             times = num_subjects)
lab_values <- runif(num_subjects * num_days, min = 0, max = 3000)
df <- data.frame(
  SubjectID = factor(subject_ids),
  Date = as.Date(dates),
  Lab = lab_values
)

Calculate streaks lengths based on linear (over time) thresholds

It turns out that one of the built-in functions can exactly do what you need. We simply need to make some adjustments to the dataframe fed to the detect_event() function, which is the function that will count the streak lengths.

Calculate a mean value and a threshold; you can calculate an overall mean and threshold, or a mean and threshold for each group--- this will depend on your experimental design and hypotheses.

I calculate a mean and threshold based on the pooled data (across A-C):

df2 <- df |>
  mutate(seas = mean(Lab),
         thresh = 600) # your threshold value here

Even though we calculated the mean value, this is not used; only the threshold is used.

results <- plyr::dlply(.data = df2, .variables = "SubjectID",
function(sub_df) {
  detect_event(sub_df,
               x = Date,
               y = Lab,
               seasClim = seas,
               threshClim = thresh,
               minDuration = 1,
               maxGap = 3,
               coldSpell = TRUE,
               protoEvents = FALSE)
})

results is a list of dataframes, one pair of dataframes for each subject. Let us look at the first list element, which is for SubjectID == 1:

results[[1]]

The first dataframe is called climatology and the other is called events. Don't worry about the names as the function was initially written for climate events. The climatology dataframe contains all the data for SubjectID that were initially supplied in df2 and a few new columns, threshCriterion, durationCriterion, event, and event_no are added at the end. When the Lab value dips below the thresh, threshCriterion will flag as TRUE regardless of how long it remains below the threshold. durationCriterion flags as TRUE if the number of times threshCriterion is equal to or greater than minDuration. event flags as TRUE if threshCriterion is TRUE AND durationCriterion is TRUE. A unique identifier is given for each event in event_no.

The events dataframe contains the event_no, the start and end dates of the event, and the event duration (your 'streaks'). Various other summary stats are also calculated, but these might not be relevant for your question. Or are they?

On Wed, 22 Nov 2023 at 09:25, Albertus Smit @.***> wrote:

Have you checked spam? Or send me your direct email address?

Rob will be able to post it on GitHub and the software's website. In the next 30 minutes I'll load it onto my website where you'll be able to see it.

On Wed, 22 Nov 2023 at 09:15, hichew22 @.***> wrote:

It still does not show up in my email. Very odd! Could you post it here on GitHub?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822228053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAJE3PSTTWCAEQT4WVDYFWQ7LAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIZDQMBVGM . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago

Hi, see here for the full version of the analysis:

https://tangledbank.netlify.app/vignettes/Lab_streaks.html

Please let me know if the appraoch works for you.

AJ

On Wed, 22 Nov 2023 at 09:33, Albertus Smit @.***> wrote:

#| warnings: false
#| message: false
library(heatwaveR)
library(plyr) # because I like plyr
library(dplyr)

Prepare a dataframe with the IDs, date, and lab values

num_subjects <- 5
num_days <- 10
subject_ids <- rep(1:num_subjects, each = num_days)
dates <- rep(seq(as.Date("2023-01-01"), by = "1 day", length.out =
num_days),
             times = num_subjects)
lab_values <- runif(num_subjects * num_days, min = 0, max = 3000)
df <- data.frame(
  SubjectID = factor(subject_ids),
  Date = as.Date(dates),
  Lab = lab_values
)

Calculate streaks lengths based on linear (over time) thresholds

It turns out that one of the built-in functions can exactly do what you need. We simply need to make some adjustments to the dataframe fed to the detect_event() function, which is the function that will count the streak lengths.

Calculate a mean value and a threshold; you can calculate an overall mean and threshold, or a mean and threshold for each group--- this will depend on your experimental design and hypotheses.

I calculate a mean and threshold based on the pooled data (across A-C):

df2 <- df |>
  mutate(seas = mean(Lab),
         thresh = 600) # your threshold value here

Even though we calculated the mean value, this is not used; only the threshold is used.

results <- plyr::dlply(.data = df2, .variables = "SubjectID",
function(sub_df) {
  detect_event(sub_df,
               x = Date,
               y = Lab,
               seasClim = seas,
               threshClim = thresh,
               minDuration = 1,
               maxGap = 3,
               coldSpell = TRUE,
               protoEvents = FALSE)
})

results is a list of dataframes, one pair of dataframes for each subject. Let us look at the first list element, which is for SubjectID == 1:

results[[1]]

The first dataframe is called climatology and the other is called events. Don't worry about the names as the function was initially written for climate events. The climatology dataframe contains all the data for SubjectID that were initially supplied in df2 and a few new columns, threshCriterion, durationCriterion, event, and event_no are added at the end. When the Lab value dips below the thresh, threshCriterion will flag as TRUE regardless of how long it remains below the threshold. durationCriterion flags as TRUE if the number of times threshCriterion is equal to or greater than minDuration. event flags as TRUE if threshCriterion is TRUE AND durationCriterion is TRUE. A unique identifier is given for each event in event_no.

The events dataframe contains the event_no, the start and end dates of the event, and the event duration (your 'streaks'). Various other summary stats are also calculated, but these might not be relevant for your question. Or are they?

On Wed, 22 Nov 2023 at 09:25, Albertus Smit @.***> wrote:

Have you checked spam? Or send me your direct email address?

Rob will be able to post it on GitHub and the software's website. In the next 30 minutes I'll load it onto my website where you'll be able to see it.

On Wed, 22 Nov 2023 at 09:15, hichew22 @.***> wrote:

It still does not show up in my email. Very odd! Could you post it here on GitHub?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822228053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAJE3PSTTWCAEQT4WVDYFWQ7LAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGIZDQMBVGM . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

This works, thank you so much for your help! I checked a couple subject ID durations against the output of the exceedance code I pasted in and they match! How would you recommend extracting the maximum durations for all subjects into 1 dataframe (or 0 if duration is NA)?

ajsmit commented 10 months ago

Hi, I’ll do that for you tomorrow morning. AJSent from my iPhoneOn 22 Nov 2023, at 17:03, hichew22 @.***> wrote: Thank you, AJ, this is great! How would you recommend extracting all the durations for all subjects into 1 dataframe?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer

ajsmit commented 10 months ago

Hi,

How about this?

results <- plyr::ddply(.data = df2, .variables = "SubjectID", function(sub_df) { detect_event(sub_df, x = Date, y = Lab, seasClim = seas, threshClim = thresh, minDuration = 1, maxGap = 3, coldSpell = TRUE, protoEvents = FALSE)$event })

On Wed, 22 Nov 2023 at 17:03, hichew22 @.***> wrote:

Thank you, AJ, this is great! How would you recommend extracting all the durations for all subjects into 1 dataframe?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1822938435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAKVYY3ZCHKCPMKK5KLYFYH2TAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSHEZTQNBTGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Hi AJ, when I run that, it returns a list of tibbles (1 for each subject). Is there a way I can specifically just return the 'duration' values? I'm envisioning a dataframe containing the IDs and the durations per each ID. Then, I can use the summarize and max functions to obtain the maximum duration for each ID.

ajsmit commented 10 months ago

Hi,

It should return this: [image: CleanShot 2023-11-23 at 07.30.29.png]

Please see the webpage: https://tangledbank.netlify.app/vignettes/Lab_streaks.html

Let me know if that works for you.

AJ

On Thu, 23 Nov 2023 at 07:23, hichew22 @.***> wrote:

Hi AJ, when I run that, it returns a list of tibbles (1 for each subject). Is there a way I can specifically just return the 'duration' values? I'm envisioning a dataframe containing the IDs and the durations per each ID. Then, I can use the summarize and max functions to obtain the maximum duration for each ID.

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1823834365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAKJSTR7U4LFMAU6L3DYF3MUHAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTHAZTIMZWGU . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 10 months ago

Ah, thank you very much! I did not realize "ddply" was used instead of "dlply." Thank you so much for your help and quick responses! I really appreciate it!

ajsmit commented 10 months ago

Sure, no problem.

All the best AJS

On Thu, 23 Nov 2023 at 08:21, hichew22 @.***> wrote:

Ah, thank you very much! I did not realize "ddply" was used instead of "dlply." Thank you so much for your help and quick responses! I really appreciate it!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1823868117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSALA4JCW7JTQUF2BLKDYF3TNVAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTHA3DQMJRG4 . You are receiving this because you commented.Message ID: @.***>

--

Prof. AJ Smit

Department of Biodiversity & Conservation Biology

University of the Western Cape

Private Bag X17

Bellville 7535

South Africa

Work tel.: +27 (0)21 959 3783

Fax.: +27 (0)21 959 2312

Mobile: +27 (0)78 300 6005

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

ajsmit commented 9 months ago

Hi again,

I can look into it.

So, to be clear you only want to join across gaps when the gap is exactly 2 days, not more, not less?

I am sure this is doable by fiddling with the function’s internals. Let me look a bit later today or tomorrow.

AJS

On 10 Dec 2023, at 16:10, hichew22 @.***> wrote:

Hi AJ,

Would you be able to help me modify the above function with an additional set of rules? Now, I want to set the gap to 2 and join across the gap. However, if the gap is 3, then I do not want to join the gap. Would this be possible? I wonder if we would have to create a new function for the second rule (for example, creating a data frame that includes the length of the gaps and finding when the gap is <3 or >=3), then join the outputs.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1848976435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAMMZIN5RHRFL23ZVO3YIW7GLAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBYHE3TMNBTGU. You are receiving this because you commented.

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 9 months ago

Hi AJ,

I think this can be done by just adjusting the maxGap = 2! Is this correct?

ajsmit commented 9 months ago

Yes, but maxGap will fill any gaps larger than and equal to 2 days if set as maxGap = 2. You want it to fill only 2 days as far as I understand. Come to think about it, it is actually the joinAcrossGaps argument that does the filling in of gaps.

The thing that needs to be changed is some logic within the proto_event internal function, somewhere within the maxGap and joinAcrossGaps arguments.

We can do it, but I don’t have time to do it immediately.

You’re welcome to poke around inside of proto_events function if you need it sooner than when I can get to it.

AJS

On 11 Dec 2023, at 09:13, hichew22 @.***> wrote:

Hi AJ,

I think this can be done by just adjusting the maxGap = 2! Is this correct?

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1849449045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSANSTWX635QOG5APA5LYI2XANAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZGQ2DSMBUGU. You are receiving this because you commented.

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer

hichew22 commented 9 months ago

I actually want to fill gaps that are 1 or 2, so I think maxGap = 2 actually works perfect! (just changing from 5 like in the prior example to 2) It would be interesting to see if there is a way to fill an exact number like you mentioned.

ajsmit commented 9 months ago

In that case, setting maxGap = 2 will do what you need.

AJS

On 11 Dec 2023, at 09:36, hichew22 @.***> wrote:

I actually want to fill gaps that are 1 or 2, so I think maxGap = 2 actually works perfect! (just changing from 5 like in the prior example to 2) It would be interesting to see if there is a way to fill an exact number like you mentioned.

— Reply to this email directly, view it on GitHub https://github.com/robwschlegel/heatwaveR/issues/33#issuecomment-1849474986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJSAOGUBRR7SDW7XCC47TYI2ZXHAVCNFSM6AAAAAA7U54XP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZGQ3TIOJYGY. You are receiving this because you commented.

--

Disclaimer - This e-mail is subject to UWC policies and e-mail disclaimer published on our website at: https://www.uwc.ac.za/disclaimer https://www.uwc.ac.za/disclaimer