Open GoogleCodeExporter opened 9 years ago
line 595 in inla.R that is.
Original comment by jonas.wa...@gmail.com
on 9 Dec 2011 at 11:13
Issue 15 has been merged into this issue.
Original comment by finn.lin...@gmail.com
on 10 Dec 2011 at 7:49
I think the correct solution should involve looking specifically at all the
named variables in the formula, noting if they are on the LHS or RHS of the
formula. Also, it needs to ignore any variables it does not know about. E.g.
the variable for the Cmatrix should be supplied in the data list, and as well
as all other arguments to the f-functions.
Original comment by finn.lin...@gmail.com
on 10 Dec 2011 at 7:55
Have added test files:
sandbox/R/issue-14.R
sandbox/R/issue-15.R
Original comment by finn.lin...@gmail.com
on 10 Dec 2011 at 8:19
An alternative solution if I understand thing correctly might be just to look
at the first variable on the RHS:
if it is not an f-function it must be the length of Npredict ?
if it is an f-function the first specified term must have length of
Npredict?
Original comment by jonas.wa...@gmail.com
on 10 Dec 2011 at 9:31
The problem is that NPredictor is determined by the length of the data vectors,
so it's a chicken and egg problem. We're trying to make the formula parsing
code in R do things it was never designed to do, as it assumes that all vectors
have the same length, and doesn't properly distinguish between LHS and RHS
variables. The code now handles the normal case where there is only a single
variable on the LHS, but it seems it doesn't handle at least some of the more
general cases.
Original comment by finn.lin...@gmail.com
on 10 Dec 2011 at 9:42
> The problem is that NPredictor is determined by the length of the data
vectors, so it's a chicken and egg problem.
Yes but NPredictor and MN[2] if everything is correct should be the same value?
>as it assumes that all vectors have the same length, and doesn't properly
distinguish >between LHS and RHS variables
But for inla everything on the RHS must be of equal length right?
If so one could just find the first thing on the RHS and take it's length.
---------------------------------------------------------------------------
about the issue-15.R:
For the test-file issue-15.R
the problem will be removed if changing line 552 in inla.r:
data = inla.remove(as.character(formula[2]), data)
to something like:
if(have.surv){
LHS_names = strsplit(as.character(formula[2]), split="inla.surv\\(")[[1]][2]
LHS_names = strsplit(LHS_names, split=")")[[1]][1]
LHS_names = inla.trim(strsplit(LHS_names,split=",")[[1]]) #or some other trim
}
else{ LHS_names = as.character(formula[2])}
data = inla.remove(LHS_names, data)
Comment: inla.remove(as.character(formula[2]), data) is used on line 417 so
should also be changed.
But if you allow for variables of Cmatrix,f-functions to be in the supplied
data list
then
gp = inla.interpret.formula(formula, data=data, data.model = data.model)
will fail since it changes since it does not allow varying length of objects in
data.
----------------------------------------------------------------------------
ps
in inla.R line 621
the code does the same thing in both
if and else:
y...fake = c(rep(Inf, ny))
if (debug)
print(paste("y...fake has length", length(y...fake)))
Original comment by jonas.wa...@gmail.com
on 10 Dec 2011 at 12:26
Thanks for the suggested change. Håvard will have to take a look.
When you say "the problem will be removed", does that mean "the code will do
the right thing i all situations", or just "the error message will go away in
these test cases"? ;-)
But the issue is more fundamental than just the survival models, and interacts
with mis-features in the R language, that allows it to find variables that it
shouldn't see, etc. The formula parser is fundamentally broken by design with
regards to what inla has evolved into.
Original comment by finn.lin...@gmail.com
on 10 Dec 2011 at 12:39
>Thanks for the suggested change. Håvard will have to take a look.
>When you say "the problem will be removed", does that mean "the code will do
the >right thing i all situations", or just "the error message will go away in
these test cases"? ;-)
More in the test cases :)
But I think it will solve the inla.surv case (I assume that you actually don't
use event since the observations now is y...fake (which all -inf)).
But in the line:
LHS_names = inla.trim(strsplit(LHS_names,split=",")[[1]]) #or some other trim
is probably better to use an other trim instance str_trim() (from stringr) than
inla.trim() to remove the whitespaces before and after.
But if one creates very evil variable names like " event" one runs into problem.
inla.surv(time, event) ~ with data$event from (works)
inla.surv(time, event) ~ with data$" event" (causes error)
Original comment by jonas.wa...@gmail.com
on 10 Dec 2011 at 1:51
Yes, I think your fix would handle identifying "time" and "event" from the data
vector (which would then both need to be put back again, like y...fake), so it
would be a partial fix.
Original comment by finn.lin...@gmail.com
on 11 Dec 2011 at 8:41
These issues arise only when an "A matrix" is used. The parsing in inla() for
the A-matrix models would be much easier if LHS and RHS variables were already
separated. Since there is a new interface "inla.stack" to handle constructing
data vectors for such models, the easiest and cleanest solution is likely to
supply a data object generated by this system which already separates LHS from
RHS, and only joins them together because inla() wants them in one list, to
then only try to separate them again. inla.stack can also handle separating
out other data variables, such as spde model objects, etc.
The existing data parsing could remain largely unaffected.
Original comment by finn.lin...@gmail.com
on 11 Dec 2011 at 9:02
The easy way out is to group the surv-object into one object in the data, like
this
y.surv = inla.surv(time, event)
data = list(y.surv = y.surv, x = i_x)
formula <- y.surv ~ f(x, model="generic", Cmatrix = Q) - 1
then
model <- inla(formula, family = "exponential" ,data = data,
control.predictor= list(A=A), debug =FALSE, verbose=TRUE )
runs fine.
Original comment by havard.rue
on 11 Dec 2011 at 10:13
Using the previous trick where I define y.surv, then Finn's 'stack' fails, so
its possible that I just pushed the problem to inla.stack().....
stack=
inla.stack(data=list(y.surv = y.surv),
A=A,
effects=list(x=i_x),compress=FALSE)
inla.data = inla.stack.data(stack, Q=Q)
inla.A = inla.stack.data(stack)
model <- inla(formula, family = "exponential" ,data = inla.data,
control.predictor= list(A=inla.A), debug =FALSE, verbose=TRUE )
gives:
stack=
+ inla.stack(data=list(y.surv = y.surv),
+ A=A,
+ effects=list(x=i_x),compress=FALSE)
Error in parse.input.list(inla.ifelse(is.data.frame(data), list(data), :
Effect block 1:
Mismatching row sizes: 5, n.A=200
Original comment by havard.rue
on 11 Dec 2011 at 10:21
I must add to comment 12, that grouping the data into one object in the
data-list, is the intended usage. But, its not written down anywhere... my
fault.
Original comment by havard.rue
on 11 Dec 2011 at 10:22
inla.stack needs all data-related vectors, including "E" for poisson
likelihoods, so it needs to know about all inputs to inla.surv. inla.stack
needs to manipulate all vectors related to the model, not only those used by
the formula.
For the inla.surv models as in Jonas example, at a minimum, it needs
inla.stack(data=list(time=time,event=event),...)
Hiding the vectors in an object is not a solution.
Original comment by finn.lin...@gmail.com
on 11 Dec 2011 at 11:10
I agree that comment 12 provides a temporary workaround for models not using
inla.stack.
Original comment by finn.lin...@gmail.com
on 12 Dec 2011 at 9:52
Original issue reported on code.google.com by
jonas.wa...@gmail.com
on 9 Dec 2011 at 11:12