massbays-tech / MassWateR

R package for working with Massachusetts surface water quality data
https://massbays-tech.github.io/MassWateR
Creative Commons Zero v1.0 Universal
11 stars 3 forks source link

Questions to address #3

Closed fawda123 closed 2 years ago

fawda123 commented 2 years ago

This is a collection of miscellaneous questions we need to address for the package:

ben-wetherill commented 2 years ago

List of characteristic names - Good question. There is a full list that WQX allows with 5800 entries in it. We should probably select the short list of characteristics that is included in our solution. However, the names must match the WQX list.

Result unit values - Similarly WQX has 360 options in its list. We will need to define our short list, but they must match WQX.

Relative Depth Category - Wee need to think about this.

I don't think the DQO file parameter names must match the Results file. The Results file is designed for WQX. The DQO file needs to be as user friendly as possible. But we will need a mapping of DQO names to WQX names.

I think our full set of allowed activity types is in our results file.

ben-wetherill commented 2 years ago

Here is the logic for the Results file (as of 3/9/22). The activity type mapping table at the bottom is critical.

The Results file now includes a few columns that we hadn’t discussed before – some clarifying info is below:

Activity Type mapping: Input Activity Type -------------------------- WQX output new row Activity Type Field Msr/Obs ------------------------------- Quality Control Field Replicate Msr/Obs Sample-Routine ----------------------------- Quality Control Sample-Field Replicate Quality Control Sample-Field Blank ---------- NA Quality Control Sample-Lab Duplicate ------- Quality Control Sample-Lab Duplicate Quality Control Sample-Lab Blank ----------- NA Quality Control Sample-Lab Spike ----------- Quality Control Sample-Reference Sample

ben-wetherill commented 2 years ago

Regarding the activity types in the completeness check... One lab spike activity type is missing ("Quality Control Sample-Reference Sample"), but this will only be in the WQX output file. Also note that the field duplicate activity types will not show up in the Results file. They will only be in the WQX output file. In the code where you are doing sum(), I assume you are getting the count of records not the sum of values. The completeness check is a count of records.

I tried to make some edits directly in the code, but I got a message saying I must be on a branch to make or propose changes. I'm not sure what that means.

ben-wetherill commented 2 years ago

Regarding the Relative Depth Category column... I think the actual name for this column in WQX is "Activity Relative Depth Name", so we should probably change to that. The allowable values are "Surface", "Midwater", "Bottom", and many others. But we are only planning to analyze "Surface". "< 1m / 3.3ft" is not an option for Relative Depth. That was referring to values in the Activity Depth/Height Measure Column. If we see values that are less than 1m or 3.3 ft, then they will be considered the same as "Surface" and included in our analysis. Though, I'm thinking we might need to change this to <=0.5m/2ft, because 1m/3ft is sometimes a non-surface measurement.

ben-wetherill commented 2 years ago

Here is a file of parameter names and mappings, but it may not be the full list yet Parameter Mapping.xlsx .

carrjill commented 2 years ago

@fawda123 standby for an updated Parameter Mapping file.

Re: Relative Depth, users' data can include Surface, Bottom, Near Bottom and Midwater. As Ben said the tool will most likely use data that are either (a) logged as Surface in the Relative Depth column, or (b) recorded as < 1m / 3.3ft in the Activity Depth column.

Re: DQO file parameter names, I do think these should match the results, especially now that we're providing simplified options. I can't see how Marcus would map every possible user-provided format to the Results.

@ben-wetherill We should keep the 1m / 3.3ft depth limit because it is the standard for EPA and DPH bacteria sampling programs. Renaming the Relative Depth column now to match WQX is fine, or we can do it in the WQX template package step which will involve a lot of additional column naming.

ben-wetherill commented 2 years ago

Here is an updated Parameter Mapping file. You can probably ignore the Parameter Group column for now. That is primarily just to help us organize our thinking, but it might come in handy when we are thinking about analysis graphs. Parameter Mapping_4-29-22.xlsx

fawda123 commented 2 years ago

Thanks @ben-wetherill and @carrjill, I'm trying to digest all of the comments in this thread. In hindsight, one question/comment should match to one issue so we're not tracking multiple problems in one thread. That's my fault for adding multiple questions in the original issue.

For the list of characteristic names, I've added a new dataset to the package that uses the Parameter Mapping file from 4/29 that Ben posted in his last message. The parameters in the "Simple Parameter" column are now used as the common set that are checked on data import of the results file. I also went through and changed the results and DQO files (using the new ones Ben sent via email on 4/26) to have consistent characteristic names that match the parameter mapping file. If I'm understanding correctly, we are now allowing our template to use these simple names with the hopes that users will follow suit when they're importing their own data. I'll have to map the names to the WQX format for any data intended for upload after going through the package workflow. Will do that later...

I also added "Gage" and "Air Temp" to the parameter mapping file because these variables were included in the original results file. So, now these are "acceptable" entries from our master list.

fawda123 commented 2 years ago

I also added some additional checks in check_results() for the result unit values. Importing the results file will also now check for:

ben-wetherill commented 2 years ago

Clarification on parameter mapping - Jill and I were thinking that the mapping between Simple and WQX parameters should be just like synonyms. In other words, in any file, the users should be able to use either form and R should convert to the preferred form for each output. So, users' input files (Results and DQOs) would have either Simple or WQX names (no other names allowed), depending on user preference. This could even be a mixture, meaning users might use Simple names in the DQO files and WQX names in the results file. R should think of them as synonyms. Our QC Review output and analytical outputs should probably use the Simple names, and our WQX output must use the WQX names. Is that reasonable?

Regarding Gage and Air Temp - I sort of left these out on purpose, because these are in that gray area of parameters that we might load into WQX but we don't track them for data quality. They are low priority and probably useless to DEP, but OARS includes them in the WQX file just because they always have. My guess is that many groups may have a parameter or two like this. It wouldn't be hard to argue to remove them completely from the WQX upload, but I'm thinking it might be simplest for us to just allow them to pass through. Maybe it's okay not to have them in the DQO file and just have a warning. What do you think?

Regarding Result Units - I think your checks are good. I'm thinking it is reasonable to expect users to stick to a single unit of measure for each parameter. For example, if their DO DQO is in mg/L, then their results file can't be in ug/L. If we didn't require this then we would need to maintain all of the conversion factors. Some are easy, but some are not.

ben-wetherill commented 2 years ago

I guess this issue can be closed, right?