Ye JOSS Review Comments

Hi,

This seems like a very handy tool for resolving a tedious aspect of computing fluxes from raw sensor data.

Below are the detailed comments for my JOSS review:

Functionality:

Load_LGR()
- The description needs some more details about the data processing that is occurring - is there cleaning, transformation, etc. that is occurring?
- the Time_format argument only takes a few values. Preferred style for handling this is using match.arg(), but in this case, I'd recommend re-using the orders argument from lubridate::parse_date_time() and passing the argument through.
Load_Other
- I was a bit confused about the Date argument, in that it represents a constant date on which the measurements were made, if such data were not present in the file. Mainly because most of the other arguments refer to columns of the input data file. Maybe you could use some other argument naming scheme to indicate such (and then add a similar additional argument for temperature, instead of reusing Ta in multiple ways)
- same comment re: Time_format as for Load_LGR()
FluxCal
- In interactive mode, there are no instructions about advancing past the manual selection. I had to look up the help on identify to figure out that I needed to click a different mouse button. Unfortunately this didn't work on my laptop, and I had to plug in a mouse. There needs to be instruction for how to proceed, as well as a more accessible option.
- Depending on how I selected peaks and valleys in interactive mode for the LGR example data, I got an error and the next figure failed to draw:
```
Error in if (Slm$r.squared > R2.CO2[a, 1]) { : 
missing value where TRUE/FALSE needed
In addition: Warning message:
‘mode(display)’ differs between new and previous
==> NOT changing ‘display’ 
```
- In terms of output format, I wasn't sure what the index and time values referred to? The middle of the range that had the best linear R^2 or the start? I'd suggest using start and end values that bracket the range (this makes it easier to reproduce the linear regressions or tweak if necessary). I think the output would be easier to manipulate if CO2 and CH4 results were not in wide format, but long (since the columns are identical.
- For the option to take in timing cues, it'd be more convenient to supply this as a data argument rather than a file, as this enables processing or generating the timing cues in R. Also, in the example, it looks like there is additional metadata (the Plot and Light.Dark columns) associated with the timing cues that is ignored. It would be great to include this as part of the output, for easier analysis purposes (than having to later match the output with the metadata in the timing file).
- Is the precision sufficient on the molar mass constants? Maybe the output needs to be rounded to the corresponding significant digits. (Oh, these values are not later used?)
- There's some duplicated code that could be avoided:
- calculate ylim manually if not provided, then use the same plotting code
- calculations for CO2 and CH4 are basically identical
- The function is long, and subtasks can be handled by separate functions (e.g. generating the initial timing cues, the calculations, and then re-plotting or generating the output). A single function to do the flux calculations could be re-used for both CO2 and CH4, for example.
- Are there default values for vol and Area that would work? It's a bit awkward to have them be required arguments, but not at the front of the list of arguments. Maybe a unitless calculation could work?
The software paper mention that the analysis is "reproducible" -- but if the peaks and valleys are manually chosen, this seems to be non-reproducible. Perhaps you can add an option for selecting peaks and valleys to be turned into a timing cues data structure and output file, to make the calculations reproducible, with the manual interaction only needing to happen once.
You might try running goodpractice::gp() to look for additional issues. Not all of them are relevant for JOSS, but it can identify some general ways to improve the package.

Code Style:

the naming style for arguments and functions is a bit inconsistent: e.g. the data-loading functions use Snake_case, while the main computation function is CamelCase. Function arguments sometimes begin with a capital letter, and sometimes do not.
There are several "magic numbers" - constants used in calculation (frequency as 60 / sampling interval, celsius to kelvin conversion using 273.2, and column indices).

Error checking:

Since several of the function arguments refer to column names, some error checking with informative messages would be good to add, when a user makes a typo.

Documentation:

roxygen has functionality to create link to other topics (e.g. \link{\code{}}). There are a few spots where this would be useful in the docs.
I agree that a simple vignette showing usage of the functions, perhaps with screenshots for the interactive portion, would help understand the workflow, as well as the input and output data structures.

Software paper:

editing wise, the only additional note I have to @jhollist's list is: Paragraph 1, line 4: "for" -> "because of".
I'm not sure what about the process of computing fluxes is considered "subjective" and "objective". Isn't the selection of peaks and valleys by hand (or inputting timing cues) also subjective? I think you may want to focus on how your tool makes it easier to select the windows for calculation, and that optimizing for R^2 gets the most consistent outputs.
To add to @jhollist's comment about the Statement of need, the description contained details that were difficult to distinguish as being relevant or not to the core problem. It would be better to have a simple statement about the problem of selecting measurement windows from the raw data; and then to include additional details about how the package is able to make use of air temperature for computing fluxes. I would save the description of efficiency and benefits of the package for the third paragraph, so that you have a clear description of the functionality first.
Improved precision in wording is needed in some places. What do you mean by "processing the manually measured fluxes"? From the examples and description, it seems like the sensor is automated, and the final output are flux calculations.
Is the method of selecting the window with the highest R^2 a commonly accepted method, or compared against other methods? A reference to a methods paper feels warranted here.

Thank you so much for your suggestions! Please see below my response to each of the points.

Hi,

This seems like a very handy tool for resolving a tedious aspect of computing fluxes from raw sensor data.

Below are the detailed comments for my JOSS review:

Functionality:

Load_LGR()

The description needs some more details about the data processing that is occurring - is there cleaning, transformation, etc. that is occurring?

Response: the function mainly converts the timestamps into the one R can read and also removes unused lines in the raw data file. This description has been added into the help file.

the Time_format argument only takes a few values. Preferred style for handling this is using match.arg(), but in this case, I'd recommend re-using the orders argument from lubridate::parse_date_time() and passing the argument through.

Response: I revised the code with match.arg() and parse_date_time() added as suggested. Thank you!

Load_Other

I was a bit confused about the Date argument, in that it represents a constant date on which the measurements were made, if such data were not present in the file. Mainly because most of the other arguments refer to columns of the input data file. Maybe you could use some other argument naming scheme to indicate such (and then add a similar additional argument for temperature, instead of reusing Ta in multiple ways)

Response: I changed the date argument to “date_ms” standing for date of measurement to distinguish it from other argument that takes column names. “Ta” is kept as it was here, since it takes the column name of air temperature which is in line with other argument (e.g., time, CO2 and CH4). For the “Ta” in the function FluxCal(), I have changed it to “df_Ta”, indicating it takes a data frame instead of number or column name.

same comment re: Time_format as for Load_LGR()

Response: revised accordingly.

FluxCal

In interactive mode, there are no instructions about advancing past the manual selection. I had to look up the help on identify to figure out that I needed to click a different mouse button. Unfortunately this didn't work on my laptop, and I had to plug in a mouse. There needs to be instruction for how to proceed, as well as a more accessible option.

Response: I have added a more detailed instruction in the interactive graph and included two steps and clarified that the user should end the operation by clicking the “Stop” button on the top-left corner of the window.

Depending on how I selected peaks and valleys in interactive mode for the LGR example data, I got an error and the next figure failed to draw:
Error in if (Slm$r.squared > R2.CO2[a, 1]) { : 
missing value where TRUE/FALSE needed
In addition: Warning message:
‘mode(display)’ differs between new and previous
 ==> NOT changing ‘display’ 

Response: the error is caused by selecting a point that results in a failure in the regression. This error message has been added to Line 217 in the FluxCal() function and tested to be working.

In terms of output format, I wasn't sure what the index and time values referred to? The middle of the range that had the best linear R^2 or the start? I'd suggest using start and end values that bracket the range (this makes it easier to reproduce the linear regressions or tweak if necessary). I think the output would be easier to manipulate if CO2 and CH4 results were not in wide format, but long (since the columns are identical.

Response: the output format has been modified to include both the start and end time that each flux is calculated and used the long format. The index is removed to avoid confusion.

For the option to take in timing cues, it'd be more convenient to supply this as a data argument rather than a file, as this enables processing or generating the timing cues in R. Also, in the example, it looks like there is additional metadata (the Plot and Light.Dark columns) associated with the timing cues that is ignored. It would be great to include this as part of the output, for easier analysis purposes (than having to later match the output with the metadata in the timing file).

Response: Agree! I have modified the input of the time cues to be data frame instead of a file. In addition, I added an argument “other” to take the column names in the time cue data that can passed to the output.

Is the precision sufficient on the molar mass constants? Maybe the output needs to be rounded to the corresponding significant digits. (Oh, these values are not later used?)

Response: all the unused constants are now removed. An argument “digits” has been added so that the users can define by themselves.

There's some duplicated code that could be avoided:

calculate ylim manually if not provided, then use the same plotting code

calculations for CO2 and CH4 are basically identical

Response: these duplicated codes have been built into functions and called afterwards.

The function is long, and subtasks can be handled by separate functions (e.g. generating the initial timing cues, the calculations, and then re-plotting or generating the output). A single function to do the flux calculations could be re-used for both CO2 and CH4, for example.

Response: the time cue selecting procedure has now been separated from the original FluxCal() function and incorporated into an independent function SelCue(). The new FluxCal() function is now specified to calculate the fluxes. In the FluxCal() function, I added an argument cal for the users to choose which gas flux (or both) to calculate.

Are there default values for vol and Area that would work? It's a bit awkward to have them be required arguments, but not at the front of the list of arguments. Maybe a unitless calculation could work?

Response: these arguments are the unique dimensions of the chambers used for the measurements. Therefore, they are different from case to case and should be entered by the users themselves. I have moved these two arguments up to 3rd and 4th place.

The software paper mention that the analysis is "reproducible" -- but if the peaks and valleys are manually chosen, this seems to be non-reproducible. Perhaps you can add an option for selecting peaks and valleys to be turned into a timing cues data structure and output file, to make the calculations reproducible, with the manual interaction only needing to happen once.

Response: agree! I have added an argument save to automatically export the selected cues as a file that can be re-used later.

You might try running goodpractice::gp() to look for additional issues. Not all of them are relevant for JOSS, but it can identify some general ways to improve the package.

Response: codes have been revised according to the suggestions returned from goodpractice::gp().

Code Style:

the naming style for arguments and functions is a bit inconsistent: e.g. the data-loading functions use Snake_case, while the main computation function is CamelCase. Function arguments sometimes begin with a capital letter, and sometimes do not.

Response: I have carefully revised the name style and used small case letter for argument (except for CO2, CH4 and Ta, which are conventions) and the CamelCase for function names.

There are several "magic numbers" - constants used in calculation (frequency as 60 / sampling interval, celsius to kelvin conversion using 273.2, and column indices).

Response: column indices are replaced with the column names. Other constants are defined at the beginning of the code and called by the names afterwards.

Error checking:

Since several of the function arguments refer to column names, some error checking with informative messages would be good to add, when a user makes a typo.

Response: error messages are added which are triggered when wrong column names are detected.

Documentation:

roxygen has functionality to create link to other topics (e.g. \link{\code{}}). There are a few spots where this would be useful in the docs.

Response: links have been added to the help whenever other functions are mentioned.

I agree that a simple vignette showing usage of the functions, perhaps with screenshots for the interactive portion, would help understand the workflow, as well as the input and output data structures.

Response: a vignette file is added with examples of using the functions under different scenarios. Screenshots are included also to demonstrate the interactive cue selecting graph and the check-up graph.

Software paper:

editing wise, the only additional note I have to @jhollist's list is: Paragraph 1, line 4: "for" -> "because of".

Response: changed accordingly.

I'm not sure what about the process of computing fluxes is considered "subjective" and "objective". Isn't the selection of peaks and valleys by hand (or inputting timing cues) also subjective? I think you may want to focus on how your tool makes it easier to select the windows for calculation, and that optimizing for R^2 gets the most consistent outputs.

Response: agree! I have added more background information in the first three paragraphs and clarified the “gap” that this package fills and also highlighted the features this package has that facilitate the flux calculation.

To add to @jhollist's comment about the Statement of need, the description contained details that were difficult to distinguish as being relevant or not to the core problem. It would be better to have a simple statement about the problem of selecting measurement windows from the raw data; and then to include additional details about how the package is able to make use of air temperature for computing fluxes. I would save the description of efficiency and benefits of the package for the third paragraph, so that you have a clear description of the functionality first.

Response: according the suggestion, I explicitly explained the challenge of selecting the valid data and added the formula of flux calculation. The features of the packages have been moved down to the 5th paragraph.

Improved precision in wording is needed in some places. What do you mean by "processing the manually measured fluxes"? From the examples and description, it seems like the sensor is automated, and the final output are flux calculations.

Response: here I removed the word “manually” to avoid confusion.

Is the method of selecting the window with the highest R^2 a commonly accepted method, or compared against other methods? A reference to a methods paper feels warranted here.

Response: The flux calculation is based on the assumption that the gas concentration change follows a straight line (i.e., the flux rate is constant during the measurement period). Therefore, R2 of the linear regression is usually used to identify the quality of the measurement. Two references are added to support the idea.

junbinzhao / FluxCalR