massbays-tech / MassWateR

R package for working with Massachusetts surface water quality data
https://massbays-tech.github.io/MassWateR
Creative Commons Zero v1.0 Universal
11 stars 3 forks source link

Potential package updates for ideas from 2023 training sessions #48

Closed fawda123 closed 1 year ago

fawda123 commented 1 year ago

For anlzMWRmap:

  1. Instead of mapping the average of results by site, report the median or min or max.
  2. Change the scale from km to miles.
ben-wetherill commented 1 year ago

For anlyzMWRmap:

  1. Can the user reverse the order of the RColorBrewer pallettes?
fawda123 commented 1 year ago

@ben-wetherill yes, but I'd have to add another argument to the function. Easy to do, something like rev = T/F.

ben-wetherill commented 1 year ago

We might need to add a an extra field to the input results and the wqx output for UserRecordID. Some groups seem to have a strong need to reference their own local record IDs with the WQX records.

ben-wetherill commented 1 year ago

Also for the list, we should update the waterbody map layers to include the upper and lower Connecticut River watersheds. Also check the Merrimack and Blackstone watersheds.

fawda123 commented 1 year ago

Need to update ggplot2 arguments from size to linewidth following v3.4.0 deprecation.

fawda123 commented 1 year ago

Mean summaries for all analysis functions should default to geomean (e.g., anlzMWRmap, anlzMWRseasoon, etc.)

fawda123 commented 1 year ago

Add check to checkMWRresults to verify if depth data are included, non-informative error will be returned if depth data are missing (i.e., correct columns present, but no data).

ben-wetherill commented 1 year ago

Regarding geomean, this would only apply to parameters that are defined as logarithmic.

Regarding depth verification, either depth or relative depth is required. If both are entered, then relative depth takes precedent. I'm surprised this isn't already in the checks. I remember testing the logic.

fawda123 commented 1 year ago

@ben-wetherill yes, sorry, meant to make a note about geomean and log variables. We can default to mean or geomean based on the user input to the yscl arguments (default is auto detect from the accuracy DQO file).

The checks we have for the depth columns are:

Turns out that the checks pass if all the depth columns are completely empty because they only evaluate entries that are not NA. All NA values will not trigger any of the checks.

fawda123 commented 1 year ago

@ben-wetherill @carrjill here's my list from today, condensed info from above and added a few of the items we discussed. Let me know if this sounds right.

ben-wetherill commented 1 year ago

The list looks good. I think you've touched on everything, but since the mean/geomean/min/max changes are complicated I will paraphrase them in my own words to make sure we are all on the same page. Also, I am changing my mind about whether the min/max/median options should be available for the other graphs. For any bar plots, I can envision that people would often want to report medians instead of means. So, for consistency I suggest we apply the same options to all graphs. Do you both agree?

  1. Fix anlzMWRmap so that it defaults to geomean for parameters with logarithmic DQOs. Also confirm that all graphs do this when data is averaged (multiple sites, multiple dates, bar graphs, etc.).
  2. Add option to specify summary function (default, mean, geomean, median, min, max) to anlzMWRmap. Also add this option to all graphing functions (anlzMWRdate, anlzMWRseason, anlzMWRsite) to be valid when grouping sites or when plotting bar graphs - in other words valid any time that data is summarized with an average. The default would be "default", which defaults to geomean for parameters with logarithmic DQOs and mean for all others.
  3. Add the type of summarization (mean, geomean, median, min, max) to the title of all maps and graphs whenever data is summarized and not boxplotted. Also change current title wording from "average" to "mean".
  4. Whenever user changes yscl, if default summarization is being used, change summarization to appropriate log form. In other words, if user changes to yscl="log" then default summarization should change from mean to geomean, and vice versa. If user has specified summary function (per 2 above), then changing yscl has no effect on summarization.

I will also create a CoP post about how to handle multiple units of measure in a single year.

And, I will create updated results and WQX output templates for the UserRecordID.

fawda123 commented 1 year ago

Local Record ID pass through added https://github.com/massbays-tech/MassWateR/commit/04ce74b4b61ebe63fccacd26f51e4c1d3010500e

fawda123 commented 1 year ago

Adding a note here to upload new templates to package once all above fixes/enhancements are addressed. WQX_Phys-Chem_Template_for_MassWateR_4-7-23.xlsx MassWateR_Results_Template_4-7-23.xlsx

fawda123 commented 1 year ago

@ben-wetherill @carrjill all of the fixes and enhancements that we discussed from the workshops should be addressed now. Please have a look at the package results and let me know if there's anything incorrect or missed. I will upload the new templates and results file, then push a new version to CRAN when we're ready.

ben-wetherill commented 1 year ago

I'm still working through the testing, but here are some initial items I noticed:

  1. The depth data check does not seem to work. If remove depth data from one Field Msr record, there is no error returned when I run readMWRresults().
  2. In tabMWRwqx() output, the Local Record ID needs to be moved to column D and the title of the column should be Activity ID User Supplied. It needs to match the WQX template and WQX Output Logic exactly.
  3. The tabMWRwqx() confirmation message repeats itself twice. Here is what I received...

Excel workbook created successfully! File located at D:/Documents/~$wqx output.xlsxExcel workbook created successfully! File located at D:/Documents/wqx output.xlsx Warning message: In tabMWRwqx(fset = fsetls, output_dir = "D:/Documents/", output_file = "wqx output.xlsx") : No spatial information for sites in the Locations tab: ABT-162

  1. For the palcol reverse, would it be possible to make the argument a little more descriptive. "rev" seems very cryptic. Maybe palcolrev or colrev or revcol or something?

At first glance, the min/max/median additions are looking good, but I want to do some more testing.

ben-wetherill commented 1 year ago

I did more testing on the min/max/median/geomean/mean additions and everything seems to work perfectly! Very nice!

fawda123 commented 1 year ago

@ben-wetherill most of the issues in your last comments should be addressed now. For your first comment about the depth column checks, that was added to check if both depth columns (Activity Depth/Height Measure, Activity Relative Depth Name) were completely empty, as in Becky's case. She had no data in either of these columns and was getting a cryptic error. The check simply returns an error if both columns are empty, which I think is different from the check you tried. Are we expecting the check to do something else?

Also, I see in the commits that you've made some minor edits for the geo mean documentation in the analysis functions. That's fine, but the .Rd files should not be edited by hand. They're created automatically from text in the .R files - edit the doucmentation in the .R files instead (e.g., here instead of here). Any edits in the .Rd files will be overwritten if they're not in the .R files. I've fixed the issue already, just an FYI.

ben-wetherill commented 1 year ago

Items 2, 3, 4 are looking good.

For item 1, row 5 in the attached results file has no depth information. This should return an error, correct? It doesn't return any error or warning. Sample_Results_missing_depth.xlsx

Thanks for fixing my changes. What is the difference between .R and .Rd files?

fawda123 commented 1 year ago

@ben-wetherill, the .Rd files are used for the package help files and are written in LaTeX generated automatically using the comments and tags in the .R files.

ben-wetherill commented 1 year ago

As far as I can tell, item 1 is now fixed. I now receive an error if no depth information is entered for field measurements and samples.

fawda123 commented 1 year ago

Okay, thanks @ben-wetherill. sounds like everything is done on my end except uploading the new templates. I believe I also need an updated results file from you, can you please send that when able?

ben-wetherill commented 1 year ago

Here are the latest versions of the templates MassWateR_WQXMeta_Template_1-17-23.xlsx MassWateR_DQOFreqComp_Template_12-8-22.xlsx MassWateR_DQOAccuracy_Template_12-7-22.xlsx MassWateR_Sites_Template_5-19-23.xlsx WQX_Phys-Chem_Template_for_MassWateR_4-7-23.xlsx MassWateR_Results_Template_5-4-23.xlsx

And here are updated versions of the sample files: Sample_Results_4-21-23.xlsx Sample_Sites_5-19-23.xlsx Sample_WQXMeta_5-19-23.xlsx Sample_DQOAccuracy_5-19-23.xlsx Sample_DQOFreqComp_5-19-23.xlsx

fawda123 commented 1 year ago

@ben-wetherill I've uploaded the updated templates and sample files, I can push a new version to CRAN if all looks good.

ben-wetherill commented 1 year ago

Everything is looking good from my standpoint. I will work on a description of the update for users. We probably should wait until our meeting next Thursday to plan the go Live.