Issue warning for single column np arrays.

austindowney commented 7 years ago

Would it be possible to have symfit issue a warning if a user tries to pass it a 2D numpy array instead of a flat array?

For instance, if I input the code fit = sf.Fit(model, cycles, data_1) where cycles is an array of values from 0 to 250 in a numpy array of size (250,1) I get a line that does not even start to fit my data. However, if I use the same code where cycles is a numpy array of size (250,) I get a line that fits my data.

Maybe I am missing something, but this seems like it would be an improvement to me.

tBuLi commented 7 years ago

Can you comment on the shape of data_1 as well? My guess is that it is not a matching shape.

I designed symfit to be agnostic about the shapes of arrays, so that it can be fed anything one likes. However, it is up to the user to make sure e.g. a chi-squared can be calculated with those shapes/objects.

I agree that maybe some sanity checking should be included, to at least give a helpful message or even an error.

But it would have to be done in a way that preferably doesn't break duck-typing (you don't have to use standard numpy arrays perse, so .shape might not exist.). But I'm definitely open to suggestions.

austindowney commented 7 years ago

In my example, data_1 was the size (250,) while cycles were (250,1). This caused me lots of problems and about an hour of troubleshooting as symfit solved the equation but solved it excussevily poorly. In this condition, I would have rather had symfit not solve the fitting as that would have helped me in troubleshooting. I understand that you may want to keep it as agnostics about the shape of arrays, but maybe a simple warning for this case, (zero dimension x by 1 dimension) as they are practically the same thing.

Maybe I am missing something, but I do not see how a nx1 array (really a matrix) could be treated differently from a nx0 array?

tBuLi commented 7 years ago

Can you provide a Minimal Working Example of how symfit still produces a bad fit in this case? I would've thought it doesn't fit at all, which can be confusing but not harmful. If it does give back a result, then the status of this should be increased from an nuisance to a harmful bug ;).

What I mean by not breaking duck-typing is that if a user profides his own array class where the result will still be consistent, I dont want to break that. For example, I could imagine you have a class where you overload __mul__ such that the result goes from (n, 1) to (n,). (For example, replace __mul__ by np.tensordot) I don't want to stop users from doing that and make them use only untampered numpy arrays.

But since this project is by now already far more numpy dependent than I originally intended, just looking at .shape if it exists and giving a warning it it doesn't match does seem like a good idea.

austindowney commented 7 years ago

symfit

The following code should plot out the data sets, along with a working and non-working Symfit solution.

Also, thanks for the explanation. I think I see a little more clearly what you are trying to do with Symfit now. Thanks for all your work.

#%% import modules
import IPython as IP
IP.get_ipython().magic('reset -sf')
import symfit as sf
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')

# Load the data
data = np.array([192.90,191.60,190.90,190.50,190.20,189.80,189.50,189.30,189.10,188.70,188.60,188.30,188.20,188.00,187.80,187.60,187.50,187.10,187.10,186.90,186.90,186.60,186.60,186.30,186.30,186.20,185.90,185.80,185.70,185.60,185.50,185.50,185.10,185.10,185.00,184.90,184.70,184.50,184.40,184.50,184.30,184.00,184.00,184.10,183.90,183.80,183.50,183.50,183.60,183.40,183.30,183.20,183.00,183.00,182.80,182.60,182.40,182.40,182.40,182.30,182.10,181.90,181.90,181.70,181.60,181.50,181.30,181.10,181.10,180.80,180.80,180.70,180.40,180.40,180.10,180.10,180.00,179.80,179.60,179.40,179.20,179.00,178.90,178.70,178.80,178.40,178.30,178.10,177.90,177.90,177.60,177.40,177.40,177.00,176.80,176.70,176.40,176.30,176.20,175.90,175.80,175.50,175.50,175.10,174.80,174.90,174.60,174.20,174.10,174.00,173.90,173.60,173.50,173.00,172.90,172.80,172.40,172.20,172.00,171.90,171.70,171.40,171.20,170.90,170.60,170.70,170.40,170.00,170.10,170.00,169.60,169.30,169.10,168.90,168.60,168.40,168.20,168.10,167.70,167.50,167.30,167.00,166.80,166.60,166.50,166.20,166.00,165.80,165.50,165.20,165.00,164.90,164.50,164.30,164.20,163.80,163.60,163.30,163.30,163.00,162.70,162.50,162.30,162.00,161.70,161.50,161.30,161.20,160.80,160.70,160.60,160.30,160.20,159.80,159.50,159.50,159.00,159.00,158.80,158.40,158.10,158.10,157.90,157.70,157.20,157.10,157.10,156.80,156.40,156.30,155.90,155.80,155.60,155.30,155.10,154.90,154.70,154.80,154.40,154.10,153.90,153.80,153.60,153.20,153.10,152.80,152.50,152.40,152.10,151.90,151.70,151.50,151.20,151.10,151.00,150.60,150.30,150.20,150.00,149.90,149.80,149.30,149.20,149.00,148.80,148.60,148.50,148.10,148.00,147.80,147.50,147.20,147.00,146.80,146.70,146.60,146.20,146.10,145.90,145.60,145.50,145.50,145.00,144.80,144.60,144.50,144.30,144.10,144.00,143.90])

cycles_1D = np.arange(0,data.shape[0])
cycles_2D = np.zeros((data.shape[0],1))
cycles_2D[:,0] = cycles_1D
#%% Solve for the parameters from the training sets for each case.

# define the parameters
a = sf.Parameter(value=100)
b = sf.Parameter()
c = sf.Parameter()
d = sf.Parameter()
x = sf.Variable()

# build the model equation
model = a*sf.exp(b/1000*x)+c*sf.exp(d/1000*x)

# working fit with matching dimension data
fit = sf.Fit(model, cycles_1D, data)
fit_result = fit.execute()
model_working = model(x=cycles_1D, a=fit_result.value(a), b=fit_result.value(b),c=fit_result.value(c), d=fit_result.value(d))

# non working fit with matching dimension data
fit = sf.Fit(model, cycles_2D, data)
fit_result = fit.execute()
model_nonworking = model(x=cycles_2D, a=fit_result.value(a), b=fit_result.value(b),c=fit_result.value(c), d=fit_result.value(d))

plt.figure()
plt.plot(model_working,linewidth=0.75,label='working symfit')
plt.plot(model_nonworking,linewidth=0.75,label='nonworking symfit')
plt.plot(data,'ko',markersize=0.75)
plt.xlabel('charge cycles')
plt.ylabel('capacity (mA)')
plt.title('fit of cell capacity training data sets')
plt.legend()
plt.tight_layout()
plt.savefig('symfit',dpi=300)

tBuLi / symfit

Issue warning for single column np arrays. #134