natekupp / ffx

Fast Function Extraction
http://trent.st/ffx
Other
80 stars 97 forks source link

How to troubleshoot a bad model fit? #51

Open Javier-Acuna opened 3 years ago

Javier-Acuna commented 3 years ago

Hello,

I tried to use your library with some measurement data but the model obtained is not a good fit.

x and y values are already normalized, I already tried diverse partitioning of the test/train sets and changing the max_iter value to 50.000 but after that I don't know what else to do. Is the number of data points too small? Does ffx not handle non-deterministic (with some random noise) models? Any help would be very much appreciated

Here is a sample code:

import ffx
import numpy as np
import matplotlib.pyplot as plt

# Data to model, two measurements of 'y' for each 'x' value
x = np.array([[-1.392], [-0.985], [-0.308], [0.293], [1.046], [1.347], [-1.392], [-0.985], [-0.308], [ 0.293], [ 1.046], [ 1.347]])
y = np.array([[-1.691], [-0.925], [ 0.109], [0.768], [0.826], [0.829], [-1.673], [-1.049], [ 0.123], [ 0.833], [ 0.947], [ 0.903]])

# Plot y vs x
fig, ax = plt.subplots(1)
ax.scatter(x, y, facecolor='b', marker='o')

# Separate in train and test sets, two possibilities:
if( False ): # Alternate values of 'x' in train set
    x_train = x[1:12:2].reshape( (6,1) )
    y_train = y[1:12:2].reshape( (6,1) )

    x_test = x[0:12:2].reshape( (6,1) )
    y_test = y[0:12:2].reshape( (6,1) )
else: # Each 'x' value in train set
    x_train = x[0:6].reshape( (6,1) )
    y_train = y[0:6].reshape( (6,1) )

    x_test = x[6:12].reshape( (6,1) )
    y_test = y[6:12].reshape( (6,1) )

#Plot train/tests sets
fig, ax = plt.subplots(1)
ax.scatter(x_train, y_train, facecolor='b', marker='o', label='train')
ax.scatter(x_test,  y_test,  facecolor='b', marker='x', label='test')
ax.legend()

# max_iter changed to 50000  in model_factories.py
models = ffx.run(x_train, y_train, x_test, y_test, varnames=['x'])

for model in models:
    yhat = model.simulate(x_test)
    print(model)

    fig, ax = plt.subplots(1)
    ax.scatter(x, y, facecolor='b', marker='o', label='measurement')
    ax.scatter(x_test, yhat, facecolor='r', marker='x', label='model')
    ax.legend()

The models I obtain with the first partition are: 0.227 0.187 + 0.179*x Figure_Github_ffx_Partition_1

and with the second partition are: -0.0140 0.0116 / (1.0 - 0.150*abs(x)) Figure_Github_ffx_Partition_2

Do you have any ideas what should I try? Any help would be much appreciated