Closed jhal324 closed 3 years ago
Hmm. I have not seen or heard this kind of issues before. How did you find about the issue, i.e. what happens in this case and/or what got you thinking that the issue is related to the different dimensions?
The reason for discrepancy in method is due to the fact that it is more efficient to travel yt
etc column by column, so at some point I changed gloglik
so that I take transpose of yt
before supplying it to Fortran. But it looks like I never did it for other functions. Perhaps because the effect is marginal in most cases or maybe I just forgot...
Ok. If you would like, give me some time I will try to write something to reproduce the problem I've encountered. This may not be possible due to privacy issues, but I'll give it a go. I'm convinced the problem is in the way one of the arguments in R is specified, but I haven't found it yet. In any case, the different approach in gloglik
prevents from happening accidentally.
Originally, I found the issue from having many 0 predictions in the output of predict.SSModel() where there should not have been. From tracking through the first row of data (which were mainly missing) I discovered that there were more non-zero residual terms (vt
input/output) than there should have been. From here I was able to ascertain that ymiss
didn't match row-by-row, but did using stacked columns.
I don't think it's a big issue, as I'm sure someone else would have highlighted this by now if it was happening commonly. The discrepancy in the method that suggested to me you'd made a change to gloglik
(and perhaps others) for a reason and had simply missed the case in kfilter
. Easy to do with such a large package (and I wouldn't like to think how many I would have missed if it were my personal package!)
Okay this sounds like it is worth investigating more, even if I could just fix the issue by changing dimensions (which I think I should do anyway just for consistency etc). If you can come up with some reproducible case that would be great, you can send the codes to my email if that is better. The problem sounds like it is not data specific so you can probably just change yt
with random numbers if it helps.
Attached is a small example. I hope this is helpful. If you need a little more, let me know! issueKFAS.zip
Thanks. Running your example, I immediately get warnings:
kFilt <- KFAS::KFS(model = mod, filtering = c("state", "mean"),
smoothing = "none", simplify = F, nsim = 0)
Warning messages:
1: In KFAS::KFS(model = mod, filtering = c("state", "mean"), smoothing = "none", :
Model is degenerate, diffuse phase did not end.
2: In KFAS::KFS(model = mod, filtering = c("state", "mean"), smoothing = "none", :
Possible error in diffuse filtering: Number of nonzero elements in Finf is not equal to the number of diffuse states.
Either model is degenerate or numerical errors occured. Check the model or try changing the tolerance parameter tol or P1/P1inf of the model.
So your model is not well defined and/or there are numerical issues.
Further
#*** 1). check predictions
head(kFilt$m, 10)
series_1 series_2 series_3 series_4 series_5 series_6 series_7 series_8 series_9 series_10
[1,] 0.000000 0.000000 2.002134 2.220284 2.220284 0.000000 0.000000 0.000000 2.002134 0.000000
[2,] 2.220284 0.000000 0.000000 0.000000 0.000000 0.000000 1.889904 0.000000 0.000000 0.000000
[3,] 0.000000 0.000000 2.217555 0.000000 2.011241 1.534373 2.172247 1.951368 0.000000 2.220284
[4,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[5,] 0.000000 2.083097 1.662742 0.000000 1.844284 0.000000 0.000000 2.053766 0.000000 0.000000
[6,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[9,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[10,] 0.000000 0.000000 2.599103 0.000000 0.000000 0.000000 1.874171 0.000000 0.000000 0.000000
# a lot of zeroes!
There is of course lot's of zeroes because matrix Z contains rows with only zeroes, so Z_t * alpha_t will contain zeros by design:
ok <- numeric(79); for(i in 1:79) ok[i] <- identical(rowSums(mod$Z[,,i]> 0) == 0, kFilt$m[i,] == 0); sum(ok)==79
[1] TRUE
#*** 2). check residuals
head(kFilt$v, 20)
# also a lot of zeroes - we aren't fitting that well!
And in prediction error term I don't see any zeroes, but lot's of NA, which are there because corresponding y_t is NA (so y-Z_t*alpha = NA):
identical(is.na(as.numeric(kFilt$v)), is.na(as.numeric(mod$y)))
[1] TRUE
The data frame looks correct:
residuals ymiss_true ymiss_used
1 0.00000000 1 1
2 0.00000000 1 0
3 -0.05622435 0 1
4 -0.83399005 0 1
5 -0.42852495 0 1
6 0.00000000 1 1
7 0.00000000 1 1
8 0.00000000 1 1
9 -0.37395514 0 1
10 0.00000000 1 1
I assume your results look different? If so, what compiler are you using?
Ahh, you do not have the same problems as me. Interesting. I guess the easiest way to show this is via the final table:
from which it is easy to tell that something different has happened!
Regarding other comments:
kFilt$m
and kFilt$v
I have even more zeroes than appear for you - I would suggest this is because NA values are replaced by zeroes somewhere.To answer your question, I think my compiler is clang4 (from reading the .R/Makevars file). I'm not sure it's relevant, because I believe was able to reproduce this error on a linux system, but my main OS is OS X (Mac).
Ok this is really weird, I'll check with my Linux system as well (I ran the example in Windows before).
I don't see any issues with CentOS Linux either. I also built KFAS again with your example as test case, and ran that using address sanitizers with rhub
, no errors. Unfortunately I have no access to OS X (rhub does not support gfortran at the moment it seems).
I have checked on my linux system and my answers agree with yours: So perhaps it only on OS X that I have these issues.
Have you installed KFAS from source or from OSX binary package? If from source, could you find out what version your Fortran compiler is?
To install KFAS I've been using:
devtools::install_github("Helske/KFAS")
and the result of gfortran --version in Terminal is:
GNU Fortran (Homebrew GCC 8.2.0) 8.2.0
What if you install the package from CRAN, can you still see the issue?
Interestingly, no. My residual table matches the black/green figure above
So it looks like a compiler issue. The gfortran in CRAN for OSX is 6.1, while for Linux it is actually 8+ so I'm not sure whether it is certain version of gfortran or combination of the compiler and OS X. I'll probably different gfortran versions on Linux later, but anyway I'll reverse the dimensions in other functions so let's see if the problem persists after that.
Ok, that makes sense. Thanks!
The version in GitHub now has dimensions flipped everywhere, could you please check if you still get the same error?
Surprisingly, that hasn't fixed the problem. I think that fix had worked for me locally though... I guess the compiler issue goes deeper than that simple tweak.
Hello again. If it is not clear, I am enjoying working with your package! Unfortunately I do not have a working example of this suggestion.
This problem regards the reading of variables in the subroutine
kfilter2
(and oldkfilter
). During the routines,ymiss
andyt
are declared to bedimension(n,p)
and in calls todfilter1step
andfilter1step
, the following line is used:dfilter1step(ymiss(d,:),yt(d,:),...)
. Sometimes I believe there are issues in the the way Fortran reads variables (perhaps related to an incorrect storage mode) and it is not always the case that rows will be taken usingymiss(d,:)
. Problems can occur when, instead of behaving as expected, Fortran stacks columns ofymiss
andyt
on top of each other and takes, in this case, the nextp
values to feeddfilter1step
(not matching up with the dth row values).I wonder if you have come across this behaviour before, due to the different declarations in the subroutine
gloglik
.ymiss
andyt
are declared to bedimension(p,n)
here and later columns are used instead:dfilter1step(ymiss(:,d),yt(:,d),...)
. This will not have the same problem as default behaviour will result in correct values taken at each step.My question is this: Is there a reason for one routine to use
dimension(n,p)
and the other to usedimension(p,n)
? I believe it is only with incorrectly declared variables that this issue will appear, but it may be worth fixing nevertheless.Many thanks!