jakegross808 / pacn-veg-package

all pacn veg code
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Decide on highest priority EDA functions #20

Open wright13 opened 2 years ago

wright13 commented 2 years ago

...or QA/QC, whatever is most useful to start with! When I break up tasks into package functions I think of what outputs I want - usually either a table resulting from a specific data summary/analysis, or a figure (map, graph, etc) - and I write one function per output. If I find myself copy-pasting code from one function into another, that's usually a sign that those tasks should be combined into a single function (or the copy-pasted code needs its own function). Often, one function will feed into another - e.g. the tabular output of a summary function can feed into a function that plots that output.

One-function tasks might look like: Map plot locations Calculate sample size Calculate understory density Calculate understory density % change Plot understory density % change

General function structure:

SomeUnderstoryAnalysis <- function(park, sample_frame, certified) {
    raw_data <- FilterPACNVeg("Understory", park, sample_frame, certified)

    # Do a single summary or analysis task here - this is often a chain of calls to dplyr functions

    return(summarized_data)
}
wright13 commented 2 years ago

Feel free to look through the MOJN spring veg code for some examples. https://github.com/nationalparkservice/mojn-sv-rpackage/tree/analysis/R

jakegross808 commented 2 years ago

This is a pretty steep learning curve for me since I rarely write functions. But I'm making slow progress =) I have a file called EDA_understory with a couple functions -

After some time spent on this I think it would help me to discuss a few more things with you while screen sharing:

Topics to discuss 1) how to set default in FilterPACNVeg so that QA_plots are not included by default (and remove hide that column) 2) how to update cached data tables when running LoadPACNVeg function 3) how to utilize "Understory" variable within FilterPACNVeg function inside other functions 4) best way to handle NA records when performing a dplyr::summarize 5) basic error handling that I should be including in each function and best practices.