kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
178 stars 81 forks source link

more detailed explanation of Matrix/TMB compatibility requirements? #387

Closed bbolker closed 8 months ago

bbolker commented 1 year ago

Hi,

I'm well aware of the need for Matrix/TMB compatibility, as added in this commit.

This is a continuing source of friction in packages/stacks that use TMB (e.g. wham, macpan2, glmmTMB. It's very easy to make the warning go away if you have development tools installed and you know what to do, but despite our attempts to document this in glmmTMB (e.g. here, here), it continues to be a source of concern and confusion for end-users. When this comes up on the r-devel list, we've tried to suggest that there be a way to ask CRAN to re-build binary packages when a new version of a lower-level package (Matrix > TMB > glmmTMB) is posted, but we haven't had much response. The other alternative is to make sure that we bump the versions ourselves and re-post to CRAN as necessary, but so far this also seems to be a source of friction (i.e. in order to do this we have to at least make sure that we are up to date on all CRAN checks, run tests on r-hub etc., check downstream packages for breakage ...)

We could try to reduce developer friction, e.g. by making it easier for ourselves to push new glmmTMB versions (e.g. we could maintain a tag/branch with the HEAD equal to the last CRAN submission, or even keep the last-submitted tarball around, so that we could resubmit it without doing any more checks).

However, it would be really useful to me at least to understand the exact reason why this incompatibility arises, to be able to do a better job explaining and resolving the issue ...

If this is more suited for the glmmTMB issues list, or the TMB mailing list, please let me know.

kaskr commented 1 year ago

I don't remember the exact details, but the motivation for making the Matrix package version test was definitely a binary incompatibility issue (perhaps in CHM_FR or AS_CHM_FR). What I do remember is, that the problem could be triggered by installing TMB using one Matrix version and then switching to another Matrix version. After that, everything would fail, even TMB::runExample("simple").

Good news is that the problem is hard to reproduce today. If I install TMB using Matrix 1.6-1, I can switch to any other Matrix version without breaking TMB! (I tested down to Matrix 1.2-3 from 2015-11-19, the earliest I can install).

Based on this, I think the version test isn't justified anymore and can be omitted by default.

kaskr commented 1 year ago

@bbolker I was wrong - we really need to keep the version test. Here's an example:

I start out with consistent build from source of the 3 packages:

install.packages(c("Matrix","TMB","glmmTMB"))
## trying URL 'https://cloud.r-project.org/src/contrib/Matrix_1.6-1.1.tar.gz'
## trying URL 'https://cloud.r-project.org/src/contrib/TMB_1.9.6.tar.gz'
## trying URL 'https://cloud.r-project.org/src/contrib/glmmTMB_1.1.8.tar.gz'

I can run example(glmmTMB) without problems. But then I upgrade Matrix to the current development version (presumably next release):

install.packages("Matrix", repos="http://R-Forge.R-project.org")
## trying URL 'http://R-Forge.R-project.org/src/contrib/Matrix_1.6-2.tar.gz'

Now TMB is broken:

> library(glmmTMB)
Warning message:
In checkMatrixPackageVersion() : Package version inconsistency detected.
TMB was built with Matrix version 1.6.1.1
Current Matrix version is 1.6.2
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
> example(glmmTMB)

glmTMB> ## No test: 
glmTMB> (m1 <- glmmTMB(count ~ mined + (1|site),
glmTMB+   zi=~mined,
glmTMB+   family=poisson, data=Salamanders))
Error in fitTMB(TMBStruc) : 
  negative log-likelihood is NaN at starting parameter values

I follow the advice of the warning:

install.packages('TMB', type = 'source')

and it fixes the problem! (i.e. example(glmmTMB) works again).

bbolker commented 1 year ago

Thanks. It would still be nice to know why it fails, as I don't ... although at least this nice reproducible example will make it a little easier to investigate? (There's nothing obvious in the Matrix devel-version NEWS file that indicates API breakage ...)

kaskr commented 1 year ago

There's no API change. That is demonstrated by the fact that I can recompile TMB and make things work again.

In case of the Matrix package, the 'API' contains a number of macros e.g. AS_CHM_SP (x) := as_cholmod_sparse(...). It seems the new Matrix package no longer has as_cholmod_sparse. Perhaps it has been renamed... In any case, the API AS_CHM_SP(x) hasn't changed. However, the binary 'TMB.so' compiled with old Matrix is not compatible with new 'Matrix.so'. This analysis might not be entirely accurate, but the important point is that a binary incompatibility can arise without there being an API change.

jaganmn commented 1 year ago

Yes, Matrix 1.6-2 fixes some bugs in its headers, resulting in ABI changes. Here are the NEWS entries concerning the ABI or API:

      \item The prototype of API function \code{M_cholmod_band_inplace}
      was wrongly copied from \code{cholmod_band},
      instead of from \code{cholmod_band_inplace}.

      \item Many API function prototypes wrongly used \code{const}
      qualifiers where the registered routines do not.

      \item Some API declarations and macros not used by \emph{any}
      reverse \code{LinkingTo} are removed or remapped.

      \item API headers are now nested under \file{inst/include/Matrix/}
      for better namespacing.  Where possible, packages should start to
      use \code{LinkingTo: Matrix (>= 1.6-2)} and include files from the
      new subdirectory, e.g., with \code{#include <Matrix/Matrix.h>}.

      \item Users including API headers can define macro
      \code{R_MATRIX_INLINE},
      typically with \code{#define R_MATRIX_INLINE inline},
      to allow the compiler to inline stubs for registered routines.

So, indeed, 1.6-2 breaks a long run of Matrix releases without ABI changes. But that does not imply that TMB should continue to warn about any version mismatch by default.

When we release new versions of Matrix, it is our responsibility to notify the CRAN team about ABI changes and to ask for rev. dep. binaries to be rebuilt. And we will do that when we submit 1.6-2.

But rebuilds do not affect package versions: currently, users must "know" to use install.packages and not update.packages to get the latest build. That is not a reasonable expectation, so it is also our responsibility to ask rev. dep. maintainers affected by ABI changes to submit an update coinciding with the next Matrix release.

The updates serve only to ensure that update.packages installs the latest builds, so they should be minimal, typically only changing the Date and Version fields in DESCRIPTION and ideally also changing to LinkingTo: Matrix (>= x.y-z), if x.y-z is the new version. Such updates do not require extensive rev. dep. checking.

This whole situation should be considered exceptional, happening only if we update SuiteSparse or discover bugs and only until CRAN starts tagging builds. For as long as I am/have been a co-author, we do not change the ABI without determining and consulting affected rev. dep. maintainers. (And had I not seen this thread I'd have notified you about the 1.6-2 changes. Well, you might still receive an e-mail.)

I am really not convinced that TMB and others should continue defending against problems that arise only exceptionally and which should be handled as much as possible by the maintainers introducing ABI changes (by asking CRAN for rebuilds, rev. dep. maintainers for version bumps) and as little as possible by end users.

jaganmn commented 1 year ago

Perhaps TMB could change to warning about version mismatch only when the mismatch involves Matrix versions known to introduce an ABI change. For now, the list would contain only 1.6-2, unless you recall the much older (and now probably irrelevant) instances. When we update SuiteSparse in 1.7-0, we may ask you to add 1.7-0, and so on ...

kaskr commented 1 year ago

@jaganmn Thanks for dealing so thoroughly with this issue! I'm going to follow your advice to only check the few versions that might cause trouble.

jaganmn commented 1 year ago

Eventually (actually TODO for 1.6-2), Matrix will associate with three "versions" detectable from C and R code: a package version, a SuiteSparse version, and an ABI version. The ABI version will obviate the need for a hard-coded list of ABI-breaking package versions.

jaganmn commented 1 year ago

Done now, with NEWS:

      \item New \R{} function \code{Matrix.Version}, taking no
      arguments and returning \code{list(Package, ABI, SuiteSparse)}
      giving the numeric versions of each.  ABI versioning is new:
      the ABI version is 1 in this release and will be incremented
      by 1 in each future release that changes the ABI.
      Versions and their components are defined for packages with
      \code{LinkingTo: Matrix}.  See \file{inst/include/Matrix/Matrix.h}.

So you can start comparing ABI versions defined backwards compatibly as:

abi <-
    if (packageVersion("Matrix") >= "1.6-2")
        Matrix.Version()[["ABI"]]
    else numeric_version("0")

Edit: I've just changed to list(package, abi, suitesparse), i.e., to using lower case, in order to match R.Version. So you'll want Matrix.Version()[["abi"]].

kaskr commented 1 year ago

As a reminder for myself, the new Matrix::Matrix.Version() can give the ABI for the current installation of Matrix, but it doesn't provide a way to compare ABIs across different Matrix versions.

> Matrix:::Matrix.Version()
$package
[1] ‘1.6.2’

$abi
[1] ‘1’

$suitesparse
[1] ‘5.10.1’

In the future we should store and test on Matrix 'abi' rather than 'package version' (there is no way to lookup ABI from package version).

tillea commented 1 year ago

Just to add a further data point in how far this strict versioned dependency is unfortunate for consumers of this code. In Debian there is a bug report about this where we are discussing the consequences for the Debian distribution. It would be great if some well designed tests inside TMB the strict compatibility requirements could be dropped. Kind regards, Andreas.

jaganmn commented 1 year ago

And ideally TMB would maintain its own ABI version for use by its reverse dependencies. Then they can stop depending so strictly on package versions, too, assuming that TMB is a good citizen and does not change its ABI with every release.

tillea commented 1 year ago

And ideally TMB would maintain its own ABI version for use by its reverse dependencies. Then they can stop depending so strictly on package versions, too, assuming that TMB is a good citizen and does not change its ABI with every release.

This would be extremely helpful.

kaskr commented 1 year ago

Point taken - there should ideally be an identifier (let's call it ABI version) to signify when reverse LinkingTo: TMB must recompile to avoid unnecessary rebuilds. It just bothers me that such version tags are prone to human errors. Learning from past mistakes, ABI breakage can be tricky to predict. I would also have appreciated some official guidelines from the 'Writing R Extensions' manual on how to access the ABI version (Why not put it in the DESCRIPTION file? Why not use it in install.packages etc. to prevent incompatible updates?). Anyway, I'll probably follow the approach of Matrix and add a TMB.Version() which can then be used by glmmTMB. However, @tillea note that it won't let the problem go away completely - just makes it happen less frequently.

tillea commented 1 year ago

However, @tillea note that it won't let the problem go away completely - just makes it happen less frequently.

Perfectly understandable - thanks for working on making this less frequently. Andreas.

bbolker commented 8 months ago

Just wanted to bump this - would be nice to have a TMB.Version() modeled after Matrix::Matrix.Version() ...

kaskr commented 8 months ago

@bbolker Just added TMB.Version() so closing this...

However, it seems to me like a lot of machinery to solve an almost non-existing problem: The list of ABI breaking versions TMB:::abi.break is tiny, and AFAICT only one of these is 'forward breaking' (1.8.0).