Closed adamhsparks closed 2 years ago
@PaulMelloy, why is the data file that should be produced at the end of this Rmd imported at the head? That should not be happening.
@PaulMelloy, can you please provide some documentation for the fold_data()
? I'm still unclear as to why it's useful instead of using dplyr::*_join()
. Just looking at the code I'm really not understanding what it's purpose is.
The differences between the fold_data() and _join functions are (in my mind) subtle and difficult to explain. Or I don't understand the _join functions well enough.
The Join functions seem to add columns or rows which are not present in x that are present in y (and vice versa), i want to replace old data or NA data with the new values without changing the dimensions of my data.
The fold_data function matches rows from x and y based on specified (match) columns and replaces specific values in the column specified from y into x, without adding columns or rows. it also checks if the value is the same as the replacement value and does not replace if they are the same. Then reports how many values were replaced in all the matched rows.
I think you want an outer join, https://www.dofactory.com/sql/left-outer-join, which is a semi_join()
in dplyr?
BTW, we don't have to change it as long as what you have works, but could you document the code a bit better using ROxygen syntax?
I've lost track and now I can't find this file. @PaulMelloy can you link to it? Is it this file, ExcludeBook_191115_PMMB_DataWrangling_PM.Rmd? Do we need it? Do I still need to complete a code review of it?
Ok, I made a mistake earlier on this thread. You were asking about the fold_data() function and I answered with an explanation on the data_mesh() function. Sorry for the confusion.
The fold data function just re-arranges the columns so both data.frame have their columns in the same order. Then appends any unmatched columns of the non-template data_frame to the right side of the data frame and returns it. The new data.frame will also contain columns present in the template data.frame which were not originally present in new_DataFrame. I have updated the description. however not in the Roxygen file. I am still learning about how to do this with packages.
You have already provided a code review of the file. However I think there are still some things that I need to check, and these have been listed at the top of the file
Adam to provide code review of data wrangling function to help ensure data integrity.