mkao006 / sws_imputation

Repository for implementing imputation for the Statistical Working System
0 stars 0 forks source link

Safe errors #79

Open joshbrowning2358 opened 9 years ago

joshbrowning2358 commented 9 years ago

Several functions modify the data object which is passed to them by reference. We should ensure that if the function generates an error, no change is made to the passed object. Otherwise, this can result in column name changes or other unexpected behavior.

mkao006 commented 9 years ago

They were designed that way to reduce the number of assigment and also to avoid over usage of memory on the server.

However, we are changing it because we are transiting to piping with the magrittr package.

joshbrowning2358 commented 9 years ago

Ok, I can understand how passing by reference avoids overhead and prevents the need to copy data.tables a bunch of times. But, isn't it problematic if the function modifies the column names and does not return them to the original names if the function errors out? For example:

library(faoswsProductionImputation) library(faoswsUtil) library(data.table)

data = copy(okrapd) imputeProduction( productionValue = "productionValue", productionObservationFlag = "productionFlag", productionMethodFlag = "productionFlag2", yieldValue = "yieldValue", yieldObservationFlag = "yieldFlag", yieldMethodFlag = "yieldFlag2", areaHarvestedValue = "areaHarvestedValue", areaHarvestedObservationFlag = "areaHarvestedFlag", areaHarvestedMethodFlag = "areaHarvestedFlag2", ensembleModel = defaultLogistics, data = data, byKey = "areaCode" ) colnames(okrapd) == colnames(data)

But, like you say, if we're moving to a different approach then maybe that makes this issue irrelevant.

mkao006 commented 9 years ago

Well it is bad practice. But dont have a solution for it. Piping may help, but have to think about it.

joshbrowning2358 commented 9 years ago

I think we could fix this by using tryCatch(). First, we set the column names of the data.table within the function. Then, the remainder of the function can be placed inside a tryCatch call, and we add the error argument to set the names back to the original and then return the error. So, something like this:

imputeProduction = function(){ balanceProduction() setnames(x = data, old = c(productionValue, productionObservationFlag, productionMethodFlag), new = c("productionValue", "productionObservationFlag", "productionMethodFlag")) tryCatch({

}, error = function(e){ setnames(x = data, old = c("productionValue", "productionObservationFlag", "productionMethodFlag"), new = c(productionValue, productionObservationFlag, productionMethodFlag)) stop(e) }) } I'll wait on attempting to implement anything like this until after we've started using magrittr...
mkao006 commented 9 years ago

Typically, you should know what exception you are handling or actually fix the error rather than catch what ever fails.

In this case, the issue is not a bug or error. It is just an undesirable side effect when the function fails, the function should be corrected rather than catching the error.

One solution to avoid this side effect is to copy the table initially, then let all proceeding function modifies the copied table. Here is an example using piping, the arguments has been omitted.

library(magrittr) library(faoswsProductionImputation)

okraImputed = okrapd %>% copy %>% processProductionDomain %>% imputeProduction

mkao006 commented 9 years ago

However, I do agree that we should try to avoid modifying the original object.