openplantpathology / Mungbean_PM

A meta-analysis of mungbean powdery mildew control fungicide efficacy trials
https://openplantpathology.github.io/Mungbean_PM/
Creative Commons Zero v1.0 Universal
1 stars 0 forks source link

PM_MB_updated.csv provenance #13

Closed adamhsparks closed 4 years ago

adamhsparks commented 4 years ago

@PaulMelloy, I'm going through the code in the book and am unable to determine the actual provenance of this file. This should be a read only file, but it looks like at some point it's overwritten during the book knitting?

How are there AUDPC and AUDPS values in this file? Those calculations should be done on-the-fly and if necessary results should be stored in the "cache" directory, which I created this morning in my branch I'm working on.

The data folder should be read only, we should not be writing back into it or any files contained inside of it.

PaulMelloy commented 4 years ago

The file should not be created by knitting the book. It is created through the markdown file in the folder DataWrangle (ExcludeBook_191115_PMMB_DataWrangling_PM.Rmd). It has had some changes to it recently, so perhaps when knitting it seemed like changes were made. I'll double-check that it is not altered in knitting a book

PaulMelloy commented 4 years ago

After checking the 'updated' data file was last changed when the dataWrangle file was updated. Both files have the same commit message applied to them. The reason I created this file was many changes were being made from multiple raw data files and to get around the volume of code and time needed to reformat the data every time it needs to be read in I created a new file PM_MB_updated.csv. This file is created after the data is wrangled from numerous raw data files and is when the AUDPC is calculated. We can store this data file in the cache directory if that is the appropriate thing to do. I am still learning the github repository/project etiquette. :)

adamhsparks commented 4 years ago

Ok, so I didn't understand the provenance of it. However, the raw data should always be treated as read-only in the data directory. We shouldn't be overwriting any files in that directory no matter what the script or Rmd file is.

So if it's based on the raw data but is not raw data, then we need to move it over to cache.

Not a huge deal, but can save a lot of headache.

adamhsparks commented 4 years ago

Also, if you're using write.csv() and not write_csv() you'll probably want to not write row names with the file. That results in an extra column of numeric values.

adamhsparks commented 4 years ago

Are we done with this? Should it be closed?

PaulMelloy commented 4 years ago

Yeah I think it is closed. I have amended the as per your advice. It's much better