lindsayplatt / episodic-river-salinization-model

Modeling code in a targets pipeline for understanding characteristics of rivers that experience episodic salinization from winter road salting events.
1 stars 0 forks source link

Revisit episodic classifications #16

Open lindsayplatt opened 1 month ago

lindsayplatt commented 1 month ago

Review the timeseries of sites that are classified as episodic or not. Do they make sense? Does the criteria need to change at all? Below are some that I flagged as potentially wrong.

These are currently classified as Episodic. Should they be?

01104453
03098600
03099500
03254550
03339000

These are currently classified as Not episodic. Should they be?

01115278
01437500
01465880

Also, look at the model classification confusion matrix (how many did it incorrectly classify? which ones are they?).


my_model <- targets::tar_read(p5_rf_model_optimized)
sites <- targets::tar_read(p5_site_attr)$site_no

my_model$confusion

             Episodic Not episodic class.error
Episodic           79           22   0.2178218
Not episodic       14          210   0.0625000

# Here's how you can see them
tibble(site_no = sites, 
       orig_class = p5_rf_model_optimized$y, 
       modeled_class = p5_rf_model_optimized$predicted) %>% 
    mutate(agree = orig_class == modeled_class) %>% 
    summary()

  site_no                 orig_class       modeled_class   agree        
 Length:325         Episodic    :101   Episodic    : 93   Mode :logical  
 Class :character   Not episodic:224   Not episodic:232   FALSE:36       
 Mode  :character                                         TRUE :289      
lindsayplatt commented 2 weeks ago

Had to change the way that peaks were identified to 1) include flat areas by setting the slope fwd critiera to <= 0 (if there are multiple days at the same value, the first day is counted as a peak), 2) not requiring event codes (sometimes those were NA and then turned the peak flag NA thus removing the peak), and 3) remove code that limited peaks to "global peaks" or those above the 75th.

Example of three different peaks that should be identified: image

Before/after of time series and identified peaks (the dots) for site 04219768:

image

image

lindsayplatt commented 2 weeks ago

Results using top 3 SpC values per year per season to calculate a median SpC per season. Then, winter SpC must be > 200+non-winter SpC.

new_episodic_calc.pdf