Open pckroon opened 5 years ago
This is not possible by definition because, as you say, what does it mean to provide data to an intermediate variable? They are defined by the fact that they are obtained by calculation, not from data directly.
If it not done already, this should perhaps be stated clearly in the documentation, as well as a test based on your example to make sure this is not changed in the future. Possibly this exception could be replaced by a custom one which helps the user understand the problem better.
I think there are experiments fitting the model: {y1: f(x; a), y2: g(y1, z; b)}
where both y1
and y2
can be measured.
I don't have explicit examples of course, since that would be convenient.
I think I have even done fit's like that myself, but then the role of y1
in those equations is not exactly the same. Because I take it that here you both want to fit y1: f(x; a)
using data, while simultaneously using the calculated y1
in y2: g(y1, z; b)
?
It is always possible to introduce an extra variable to get around this:
{y1_calc: f(x; a), y1: y1_calc, y2: g(y1_calc, z; b)}
This model will have both y1
and y2
as dependent variables again.
Yes. I'm looking for parameters a and b. Maybe the example is more interesting when b = a: {y1: f(x; a), y2: g(y1, z; a)}
.
Having to create a dummy variable to appease the software grates at me though, especially since I managed to measure both y1 and y2 in the lab.
I personally don't see that as a problem, because explicit is better than implicit. The problem with allowing data for interdependent variables would be that in your last example,
model_dict = {y1: f(x; a), y2: g(y1, z; a)}
fit = Fit(model_dict, y1=y1data, y2=y2data)
Would do something quite different from
model_dict = {y1: f(x; a), y2: g(y1, z; a)}
fit = Fit(model_dict, y2=y2data)
but both would run without error. I think that could cause some very frustrating bugs. I'd much rather see that the first just throws an error, so people can debug more effectively.
Put differently, the interdependent variables are not end nodes of the graph, so you cant touch them.
Whatever decision we go for, this needs to be in the documentation with big flashy letters (like the rest). I don't even really mind your examples doing different things, the input is different was well, after all. Garbage in == garbage out. More interesting, if we take the following examples, what would the objectives be?
model = Model({y1: f(x1; a), y2: g(x2, y1; b)}
fit1 = Fit(model, x1=x1_data, x2=x2_data, y1=y1_data, y2=y2_data) # 1
fit2 = Fit(model, x1=x1_data, x2=x2_data, y1=y1_data) # 2
fit3 = Fit(model, x1=x1_data, x2=x2_data, y2=y2_data) # 3
fit4 = Fit(model, x1=x1_data, x2=x2_data) # 4
fit1 would be a least squares, and fit4 a minimization of y2
(?).
I feel like fit3 should also be a least squares. I'm not sure what fit2 should be; probably an error since it's neither a minimization, nor an actual fit.
I'm working on creating some sadistic testcases for symfit, and I came across the following. Obviously this is a toy example, but that's besides the point.
In this case Y1 is an interdependent variable. and I cannot provide data for it. What does it mean to provide data for interdependent variables? Is there a fundamental reason it's not possible? The only one I can come up with is the errors become weird, and that because of that the least squares can no longer be evaluated.