Code Review DataWrangle.Rmd

openplantpathology / Mungbean_PM

A meta-analysis of mungbean powdery mildew control fungicide efficacy trials

https://openplantpathology.github.io/Mungbean_PM/

Creative Commons Zero v1.0 Universal

1 stars 0 forks source link

Code Review DataWrangle.Rmd #16

Closed adamhsparks closed 2 years ago

adamhsparks commented 4 years ago

Adam to provide code review of data wrangling function to help ensure data integrity.

adamhsparks commented 4 years ago

@PaulMelloy, why is the data file that should be produced at the end of this Rmd imported at the head? That should not be happening.

adamhsparks commented 4 years ago

@PaulMelloy, can you please provide some documentation for the fold_data()? I'm still unclear as to why it's useful instead of using dplyr::*_join(). Just looking at the code I'm really not understanding what it's purpose is.

PaulMelloy commented 4 years ago

The differences between the fold_data() and _join functions are (in my mind) subtle and difficult to explain. Or I don't understand the _join functions well enough.

The Join functions seem to add columns or rows which are not present in x that are present in y (and vice versa), i want to replace old data or NA data with the new values without changing the dimensions of my data.

The fold_data function matches rows from x and y based on specified (match) columns and replaces specific values in the column specified from y into x, without adding columns or rows. it also checks if the value is the same as the replacement value and does not replace if they are the same. Then reports how many values were replaced in all the matched rows.

adamhsparks commented 4 years ago

I think you want an outer join, https://www.dofactory.com/sql/left-outer-join, which is a semi_join() in dplyr?

adamhsparks commented 4 years ago

BTW, we don't have to change it as long as what you have works, but could you document the code a bit better using ROxygen syntax?

adamhsparks commented 4 years ago

I've lost track and now I can't find this file. @PaulMelloy can you link to it? Is it this file, ExcludeBook_191115_PMMB_DataWrangling_PM.Rmd? Do we need it? Do I still need to complete a code review of it?

PaulMelloy commented 4 years ago

Ok, I made a mistake earlier on this thread. You were asking about the fold_data() function and I answered with an explanation on the data_mesh() function. Sorry for the confusion.

The fold data function just re-arranges the columns so both data.frame have their columns in the same order. Then appends any unmatched columns of the non-template data_frame to the right side of the data frame and returns it. The new data.frame will also contain columns present in the template data.frame which were not originally present in new_DataFrame. I have updated the description. however not in the Roxygen file. I am still learning about how to do this with packages.

You have already provided a code review of the file. However I think there are still some things that I need to check, and these have been listed at the top of the file