helske / KFAS

KFAS: R Package for Exponential Family State Space Models
64 stars 17 forks source link

Wrong convergence without warnings or errors #65

Closed zjph602xtc closed 1 year ago

zjph602xtc commented 3 years ago

Dear Dr. Helske,

You added a parameter for logLik called H_tol in the latest version, which avoids many false convergence cases. However, I find other cases that both approxSSM(model)$H and approxSSM(model)$y look really well, but the logLik is not right.

The case is not very simple, please load my R image from my dropbox link.

load("new result.RData")
> logLik(model) # there is no errors or warnings, but the log likelihood is obviously not right. It cannot be so large. 
# There are only 411 points, so the mean log likelihood for each point is 101060/411 = 245, which is impossible (the density is exp(245) > 10^106). 
[1] 101060.8
> max(approxSSM(model)$H) # but approximated H looks fine
[1] 133.4956

I tried to simplify the model, but as long as I deleted any latent states or changed some parameters, then this error won't reoccur. It will converge to the right likelihood or report 'no convergency'.

The parameters in this model are optimized by optim function. I want to use MLE to optimize several parameters, but since this kind of false convergence occurs, optim function will be stuck on these parameters. Thank you for your help.

Regards Peter

helske commented 3 years ago

Are you using the latest CRAN version or Github version? There are some memory issues with the current CRAN version related to these recent changes (already resubmitted the fix), the latest Github version should work though. Testing your model on the current version I get

> logLik(model)
[1] -1.552518e+231
Warning message:
In logLik.SSModel(model) :
  Gaussian approximation failed due to non-finite value in linear predictor.
Returning -1.55251809230071e+231.
zjph602xtc commented 3 years ago

That's strange. I am using the github version. I just did the following things several minutes ago and got the same result.

> remove.packages('KFAS')  # remove any previous KFAS
Removing package from ‘C:/Users/Peter/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
> library(devtools)
Loading required package: usethis
> install_github('helske/KFAS')
Downloading GitHub repo helske/KFAS@master
√  checking for file 'C:\Users\Peter Xu\AppData\Local\Temp\RtmpQdmZ8T\remotes55c36dabf1\helske-KFAS-e04e6b7/DESCRIPTION' ...
-  preparing 'KFAS': (808ms)
√  checking DESCRIPTION meta-information ... 
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  looking to see if a 'data/datalist' file should be added
-  building 'KFAS_1.4.4.tar.gz'

Installing package into ‘C:/Users/Peter/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'KFAS' ...
** libs
c:/Rtools/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  approx.f95 -o approx.o
(skip many lines)
installing to C:/Users/Peter Xu/Documents/R/win-library/3.5/KFAS/libs/x64
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (KFAS)
In R CMD INSTALL
> load("C:/Users/Peter/Dropbox/new result.RData")
> require('KFAS')
Loading required package: KFAS
> logLik(model)
[1] 101060.8

Is there any thing wrong with my steps? Here is my R version:

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          5.3                         
year           2019                        
month          03                          
day            11                          
svn rev        76217                       
language       R                           
version.string R version 3.5.3 (2019-03-11)
nickname       Great Truth 
zjph602xtc commented 3 years ago

I just tried R 4.0.2 with clean install and get the same result.

> logLik(model)
[1] 101060.8
> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          4                           
minor          0.2                         
year           2020                        
month          06                          
day            22                          
svn rev        78730                       
language       R                           
version.string R version 4.0.2 (2020-06-22)
nickname       Taking Off Again            
helske commented 3 years ago

Strange, everything looks ok above. Are you sure you are using the same model you uploaded to the dropbox?

Changing the prior variance of the states from diffuse to something large (still diffuse in practice) shows success:

> diag(model$P1inf)[1:4] <- 0
> diag(model$P1)[1:4]<-1e12
> logLik(model) 
[1] -4641.616
> diag(model$P1)[1:4]<-1e15
> logLik(model)
[1] -1.552518e+231
Warning message:
In logLik.SSModel(model) :
  Gaussian approximation converged to a degenerate case with max(H) = 4.70880006415384e+258.
Returning -1.55251809230071e+231.
zjph602xtc commented 3 years ago

Yes. I checked many times that I am using the same image in dropbox.

I just run your codes in the lastest reply:

> diag(model$P1inf)[1:4] <- 0
> diag(model$P1)[1:4]<-1e12
> logLik(model)  # not exactly the same but very close
[1] -4602.775
> diag(model$P1)[1:4]<-1e15
> logLik(model)  # wrong results
[1] 3087.383

I also tried the same codes on two laptops and on HPC cluster. The results on one laptop and on HPC are the same as yours, but the results on the other laptop and on my desktop are different.

Not sure why this happens. But I will try to find a more simple and stable example (if there are any) to show the problem.

helske commented 3 years ago

Are there differences in operating systems? What if you change H_tol to a very large value like 1e300 or something similar? Or to a very small value just to see what happens. With very large value you should get same results as with older KFAS version before the addition of this check.

zjph602xtc commented 3 years ago

They are all on Win 10 x64.

This is what I get:

> logLik(model)
[1] 101060.8
> logLik(model, H_tol = 1e300)
[1] 101060.8
> logLik(model, H_tol = 1e500)
[1] 101060.8
> logLik(model, H_tol = 300)
[1] 101060.8
> logLik(model, H_tol = 30)
[1] -1.552518e+231
Warning message:
In logLik.SSModel(model, H_tol = 30) :
  Gaussian approximation converged to a degenerate case with max(H) = 133.49562534431.
Returning -1.55251809230071e+231.
helske commented 3 years ago

Very strange, I am also running Win10. I reran all the checks with valgrind etc. to check memory errors, but those didn't find any issues. What does this return:

diag(model$P1inf)[1:4] <- 0
diag(model$P1)[1:4]<-1e12
approxSSM(model)$iter # I get 13
approxSSM(model)$diff # 2.108071e-14
logLik(approxSSM(model)) # -5564.653

Have you modified your Makevars in ~/.R/Makevars.win? I wonder if this is related to some compiler flags or something. When installing KFAS from github, do these lines c:/Rtools/mingw_64/bin/gfortran -m64 -O2 -mtune=core2 -c approx.f95 -o approx.o differ other than the path to gfortran?

I guess your Rtools is also up to date (you can check it with pkgbuild::check_rtools(TRUE)).

zjph602xtc commented 3 years ago

I didn't modify Makevars. When I run install_github('helske/KFAS'), this is the complete log:

> install_github('helske/KFAS')
Downloading GitHub repo helske/KFAS@master
√  checking for file 'C:\Users\Peter Xu\AppData\Local\Temp\RtmpuuNMQz\remotes3318641250dc\helske-KFAS-e04e6b7/DESCRIPTION' ...
-  preparing 'KFAS': (516ms)
√  checking DESCRIPTION meta-information ... 
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  looking to see if a 'data/datalist' file should be added
-  building 'KFAS_1.4.4.tar.gz'

Installing package into ‘C:/Users/Peter Xu/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'KFAS' ...
** libs
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  approx.f95 -o approx.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  approxloop.f95 -o approxloop.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  artransform.f95 -o artransform.o
C:/RBuildTools/3.5/mingw_64/bin/gcc -m64 -I"C:/PROGRA~1/MICROS~3/ROPEN~1/R-35~1.3/include" -DNDEBUG     -I"C:/a/w/1/s/vendor/extsoft/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c cdistwrap.c -o cdistwrap.o
sh: warning: setlocale: LC_ALL: cannot change locale (English): No such file or directory
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  covmeanw.f95 -o covmeanw.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  filter1step.f95 -o filter1step.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  filter1stepnovar.f95 -o filter1stepnovar.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  filtersimfast.f95 -o filtersimfast.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  gloglik.f95 -o gloglik.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  gsmoothall.f95 -o gsmoothall.o
C:/RBuildTools/3.5/mingw_64/bin/gcc -m64 -I"C:/PROGRA~1/MICROS~3/ROPEN~1/R-35~1.3/include" -DNDEBUG     -I"C:/a/w/1/s/vendor/extsoft/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c init.c -o init.o
sh: warning: setlocale: LC_ALL: cannot change locale (English): No such file or directory
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  isample.f95 -o isample.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  isamplefilter.f95 -o isamplefilter.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  kfilter.f95 -o kfilter.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  kfilter2.f95 -o kfilter2.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  kfstheta.f95 -o kfstheta.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ldl.f95 -o ldl.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ldlssm.f95 -o ldlssm.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  marginalxx.f95 -o marginalxx.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  mvfilter.f95 -o mvfilter.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ngfilter.f95 -o ngfilter.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ngloglik.f95 -o ngloglik.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ngsmooth.f95 -o ngsmooth.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  predict.f95 -o predict.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  ptheta.f95 -o ptheta.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  pytheta.f95 -o pytheta.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  simfilter.f95 -o simfilter.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  simgaussian.f95 -o simgaussian.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  simgaussianuncond.f95 -o simgaussianuncond.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  smoothonestep.f95 -o smoothonestep.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  smoothsim.f95 -o smoothsim.o
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64   -O2  -mtune=core2 -c  smoothsimfast.f95 -o smoothsimfast.o
sh: warning: setlocale: LC_ALL: cannot change locale (English): No such file or directory
C:/RBuildTools/3.5/mingw_64/bin/gfortran -m64 -shared -s -static-libgcc -o KFAS.dll tmp.def approx.o approxloop.o artransform.o cdistwrap.o covmeanw.o filter1step.o filter1stepnovar.o filtersimfast.o gloglik.o gsmoothall.o init.o isample.o isamplefilter.o kfilter.o kfilter2.o kfstheta.o ldl.o ldlssm.o marginalxx.o mvfilter.o ngfilter.o ngloglik.o ngsmooth.o predict.o ptheta.o pytheta.o simfilter.o simgaussian.o simgaussianuncond.o smoothonestep.o smoothsim.o smoothsimfast.o -LC:/PROGRA~1/MICROS~3/ROPEN~1/R-35~1.3/bin/x64 -lRlapack -LC:/PROGRA~1/MICROS~3/ROPEN~1/R-35~1.3/bin/x64 -lRblas -lgfortran -lm -lquadmath -LC:/a/w/1/s/vendor/extsoft/lib/x64 -LC:/a/w/1/s/vendor/extsoft/lib -LC:/PROGRA~1/MICROS~3/ROPEN~1/R-35~1.3/bin/x64 -lR
installing to C:/Users/Peter Xu/Documents/R/win-library/3.5/KFAS/libs/x64
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (KFAS)
In R CMD INSTALL
> pkgbuild::check_rtools(TRUE)
Scanning R CMD config CC...
cc_path:  
'' does not exist
Scanning path...
Scanning registry...
Found C:/RBuildTools/3.5 for 3.5 
Found G:/rtools40 for 4.0 
VERSION.txt
Rtools version 3.5.0.4 
[1] TRUE

I am currently using R 3.5 with the compiler in C:/RBuildTools/

zjph602xtc commented 3 years ago

Btw I find something. After I load the R image:

> logLik(model) # wrong
[1] 101060.8
> logLik(approxSSM(model))  # this is reasonable
[1] -5357.131
> approxSSM(model)$iter
[1] 13
> approxSSM(model)$diff
[1] 3.479823e-16
> logLik(model) # still wrong
[1] 101060.8

Are they (logLik(model), logLik(approxSSM(model))) supposed to be very similar? But they are not.

Then I reload the image:

> diag(model$P1inf)[1:4] <- 0
> diag(model$P1)[1:4]<-1e12
> approxSSM(model)$iter
[1] 9
> approxSSM(model)$diff
[1] 6.036801e-15
> logLik(approxSSM(model))
[1] -5515.708
> logLik(model)
[1] -4602.775
helske commented 3 years ago

Thanks, logLik(model) does not need to be same as logLik(approxSSM(model)), the former is the approximate log-likelihood of the non-Gaussian model, whereas the former is the likelihood of the approximating Gaussian model (former is latter + a correction term).

I was now able to reproduce the issue with a Linux now. It seems that the issue is not due to these latest changes, switching to KFAS version 1.4.1 still shows the same issue.

zjph602xtc commented 3 years ago

Wow, that's great! You can reproduce the issue. Thank you!!

helske commented 3 years ago

Ok, this was tricky. There is a check in the diffuse filtering recursions regarding whether the prediction error variance F is zero but in order to deal with numerical issues the check is actually F > tolerance. The tolerance is defined as tol * max(abs(Z[Z > 0]))^2, where tol = .Machine$epsilon^0.5 by default (this can be changed when building the model by argument tol). For your model min(abs(model$Z[abs(model$Z)>0])) is 6.145215e-05, so this becomes 3.776367e-17 which is essentially too small, as can be seen from the fact that with different compilers/platforms give different results.

Unfortunately, it seems that changing the tolerance parameter still doesn't help: Some numerical issues are present with all tolerance parameters. Also without exact diffuse initialization, the results change drastically depending on the initial variance:

> x <- numeric(8)
> for(i in 1:8) {
+   diag(model$P1)[1:4] <- 10^(i+1) # change prior variance from 100,..., 10^9
+   x[i] <- KFS(model)$alphahat[1,1]
+ }
> x
[1] 4.047995e-06 4.047320e-05 4.047251e-04 4.047244e-03 4.047241e-02 4.047218e-01 4.047297e+00 4.044735e+01

So it is likely that your model is just poorly identifiable which results numerical issues during the filtering/smoothing. You might want to check if scaling model components such as Z helps, or use more informative initial prior (a1 & P1) for all states, or try to simplify the model to see where it becomes overly complex.

zjph602xtc commented 3 years ago

It is likely that your model is just poorly identifiable which results numerical issues during the filtering/smoothing.

Yes, it is possible that the model is poor with these parameters, which leads to numeric problems. However, since I am using the optim to look for optimal parameters, it is inevitable that the model sometimes needs to be assessed with some bad parameters during the optimization procedure.

In the figure above, I am changing one parameter and plot the likelihood. I can see that when this parameter (in the Z matrix) approaches around 0, the model achieves the max likelihood. But due to the 'false convergence problem', you can see there are many spikes in the figure, even when the parameter is not far the 0. Then it is impossible to find the real optimal parameter with optim in this case (Actually there are more spikes as shown in the figure. I only plotted some points). And it gets even worse when the dimension is larger than 1, where the final result usually converges to the wrong value (because this wrong likelihood is very large).

So I am thinking whether there is a method to identify such wrong convergence or maybe to avoid it?

You might want to check if scaling model components such as Z helps

I already scaled all the components that can be scaled.

or use more informative initial prior (a1 & P1) for all states

a1 is fixed by the study. I can change P1. But in your previous reply, the results change drastically depending on the initial variance.

or try to simplify the model to see where it becomes overly complex.

I tried some simulations and found that similar problems can occur in some simple cases. So that I guess simplifying may not be a final solution.

Thank you!!

Regards Peter

helske commented 3 years ago

There are often multiple local optimums in the likelihood surfaces of state space models, so in principle, some of those spikes could be real as well, and thus it is pretty difficult to assess automatically when the issue is due to numerical problems.

Yes, there are also problems with very large P1 values (which is typical and why exact diffuse initialization was developed, but which has it's own numerical issues here), because the Z for the first states contains mostly zeroes, so it takes time to gain any information about these states which leads to accumulation of small numerical errors. However, I would guess that you do actually have some prior information about the prior distribution N(a1, P1) of the initial state alpha_1 also for the first four elements which you now the model as diffuse. For example, just looking at the values the observations in our model, I doubt alpha_1 can be of the order of hundreds? So setting diag(P1)[1:4] to something like 100^2 is quite weakly informative but likely helps with the model identification problems. Of course, this brings you to the reason why diffuse initialization is often preferred: How to choose "correct" prior variance... Often it doesn't really matter if P1 is 1e3 or 1e5 etc, but here it does seem to have some effect, which can be seen from the fact that the variance of the smoothed estimate is equal to the prior variance (here 10000):

 KFS(model)$V[1:4,1:4,1]
              [,1]          [,2]          [,3]          [,4]
[1,]  1.000000e+04  3.119451e-06 -3.681512e-07  4.870584e-06
[2,]  3.119451e-06  1.000000e+04  6.253805e-06 -8.274729e-05
[3,] -3.681512e-07  6.253805e-06  2.411788e-01 -7.623950e-02
[4,]  4.870584e-06 -8.274729e-05 -7.623950e-02  1.127876e+00

This again suggests that the model is unidentifiable and the changes in the alphahat[1,1] with respect to changing P1 in post above are just "noise".