statistikat / simPop

Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information
30 stars 7 forks source link

ctree error 'dgesdd' #22

Closed Kyoshido closed 1 year ago

Kyoshido commented 2 years ago

Hello,

so I was modelling some data with ctree. And suddenly I had this error: Error in { : task 5 failed - "error code 1 from Lapack routine 'dgesdd'" Very weird, so I dig a little bit into what could cause it.

For example here, they solved it by removing collinear variables from the model. https://stackoverflow.com/questions/18192050/error-in-la-svdx-nu-nv-error-code-1-from-lapack-routine-dgesdd-when-usin Sadly, I want to model the data even with this unexpected inconvenience.

According to this website, the error code indicates that the SVD algorithm failed to converge. Evaluation of the singular values and vectors is done via an iterative optimization and on some occasions will fail to converge. https://stat.ethz.ch/pipermail/r-help/2010-May/237778.html The error message is so enigmatic, that it is because the underlying code is Fortran and does not provide much flexibility for informative error trapping.

I even found some paper, where they used party package for modelling ctree, but because of this bug rather used then partykit package, which doesn't have this error. https://arxiv.org/pdf/2007.01027.pdf

So I wanted to share my findings with you.

Kyoshido commented 2 years ago

So I had implemented and tried partykit package in simCategorical().

It is not that great either.

I had run partykit::ctree( Y~ age + sex, data=dataSample) And I have got following error

Error in doTest(lev, teststat = teststat, pvalue = pvalue, lower = TRUE, : 
cannot search for unordered splits in >= 31 levels

In the news part of partykit package is written https://cran.r-project.org/web/packages/partykit/news.html

Binary split is caused by SEX variable and my AGE variable has 108 levels, but it was unordered so I ordered it.

Then I got this error

Error in do.call("rbind", probs) : second argument must be a list

So this is in simCategorical() specifically in row 67 probs <- predict(mod, newdata=data.table(newdata), type="prob") And the reason for this error is that party package output is list and partykit package output is dataframe.

So I added probs <- split(probs, seq(nrow(probs))) which created list and then the ctree finally worked.

So the limitation of party package can be solved by partykit but it is not ideal either.