verbal-autopsy-software / InterVA4

R package for InterVA-4 software
0 stars 4 forks source link

Question/Enhancement: variable checking #5

Open mboyas-mitre opened 5 years ago

mboyas-mitre commented 5 years ago

https://github.com/verbal-autopsy-software/InterVA4/blob/792fd94fcad1f143f55865e580a4ea0bc233e5bb/InterVA4_1.7/R/InterVA.R#L195-L215

The code appears to validate that the last column is as expected scosts. If any other variable name differs from what is expected, the code provides a warning (which is good) and then replaces the variable names to match what is expected. This replacement behavior seems risky, as it assumes that the columns are simply named something different but are in the correct order. Maybe an input to allow the user to specify the order of the columns as they match to the standard could be a useful addition?

Regardless, the warning is good, but it might be useful to provide a bit more detail in the function documentation/warning message itself about what is happening there. I can see situations where users would ignore the warning message and then have invalid results.

richardli commented 5 years ago

Good point. In the openVA package, we do have another pre-check that orders the column names and make sure they match.

In InterVA4 package, this logic is a direct replication from the source code of InterVA-4 software, except the warning and replacement step. Since the original software allows the input to have different column name without checking, there may be users with slightly mis-formatted files (e.g., misspelled column names, etc.) that they use in InterVA-4 software and expect to use with the R package too.

So this warning step is a compromise that still allows such files to run, but alert users of potential problems.