isat might add each indicator individually (number of blocks = number of indicators)

jkurle commented 3 years ago

Hi all,

I found an issue which is a knife-edge case that won't occur often but it might still be good to record it in case others encounter it and are wondering whether this is intentional.

isat() creates its own blocks if they are not provided. This is done in the following: https://github.com/gsucarrat/gets/blob/7c7ac53aab229cd5f9ad174bcb5d2b0d14cf7661/gets/R/gets-isat-source.R#L315-L323

In line 316, it can happen that the denominator is 0 and hence the blockratio.value becomes infinity. Subsequently, no.of.blocksis set to ncol.adj, which is the number of indicators in the current element of ISmatrices.

Reproducible example with ncol.adj = 5 , ratio.threshold = 0.2, mXncol = 1:

y <- rnorm(20)
x <- rnorm(20)
uis <- diag(20)[, 1:5]
isat(y = y, mxreg = x, mc = FALSE, uis = uis, sis = FALSE, ratio.threshold = 0.2)

Note that I have encountered this issue with the default value of ratio.threshold = 0.8 and ncol.adj = 15, so 15 indicators were added individually and hence 15 blocks were searched. This issue can arise for any sample size (in my case 100), so it would have been feasible to simply add all 15 indicators at the same time.

I know that this is a knife-edge case and that I can simply govern the blocks myself using the blocks argument. This should probably be the solution for the users. So no immediate action is needed but we could also adjust the algorithm that determines the block size.

moritzpschwarz commented 3 years ago

Thanks Jonas - agreed that this is not something super pressing.

Just thinking of a possible solution, what do you think would do the trick? What really is our objective here? Do we care if the each block is just length one? Or is this actually desired, given that we only have one x-variable in this case?

jkurle commented 3 years ago

Do we care if the each block is just length one?

Yes, I think we care if the effective block size is only one: 1) it might be inefficient computationally and 2) potency might suffer because some phenomena (e.g. structural breaks or several outliers) might only be detected jointly, i.e. by a combination of indicators.

Or is this actually desired, given that we only have one x-variable in this case?

This is not specific to having one regressor, what matters is the linear combination of ratio.threshold*ncol.adj - mXncol. It can occur with any number of regressors (in my application, I had 12).

what do you think would do the trick?

I'm not sure yet how to modify the block size algorithm without any unintended consequences or whether that is even needed (it really is a knife-edge case). But this issue here can serve users as a guide should they encounter the same problem. We could also give a warning if blockratio.value becomes infinity and link to this issue. The user then needs to make some explicit choices about their blocks. For example, the following works:

y <- rnorm(20)
x <- rnorm(20)
uis <- diag(20)[, 1:5]
isat(y = y, mxreg = x, mc = FALSE, uis = uis, sis = FALSE, ratio.threshold = 0.2, blocks = 1)

gsucarrat / gets

isat might add each indicator individually (number of blocks = number of indicators) #38