Closed asifzubair closed 9 years ago
Please include the full trace. I'm guessing that naive bayes or LDA has a slightly different interface than other models
sure, here's the full trace.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-a1b8bc97b6b3> in <module>()
----> 1 models = explore.read_models('modelling/output_models_ELiNbLd')
/home/azubair/drain/explore.pyc in read_models(dirname, estimator)
45 def read_models(dirname, estimator=True):
46 df = pd.concat((read_model(subdir, estimator) for subdir in get_subdirs(dirname)), ignore_index=True)
---> 47 calculate_metrics(df)
48
49 return df
/home/azubair/drain/explore.pyc in calculate_metrics(df)
57 df['baseline']=df.y.apply(lambda y: y.true.sum()*1.0/len(y.true))
58
---> 59 df['coef'] = [get_coef(row) for i,row in df.iterrows()]
60
61 return df
/home/azubair/drain/explore.pyc in get_coef(row)
63 def get_coef(row):
64 if hasattr(row['estimator'], 'coef_'):
---> 65 return pd.DataFrame({'name':row['columns'], 'c':row['estimator'].coef_[0]}).sort('c')
66 else:
67 return pd.DataFrame()
/opt/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
212 dtype=dtype, copy=copy)
213 elif isinstance(data, dict):
--> 214 mgr = self._init_dict(data, index, columns, dtype=dtype)
215 elif isinstance(data, ma.MaskedArray):
216 import numpy.ma.mrecords as mrecords
/opt/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
339
340 return _arrays_to_mgr(arrays, data_names, index, columns,
--> 341 dtype=dtype)
342
343 def _init_ndarray(self, values, index, columns, dtype=None,
/opt/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
4796 # figure out the index, if necessary
4797 if index is None:
-> 4798 index = extract_index(arrays)
4799 else:
4800 index = _ensure_index(index)
/opt/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in extract_index(data)
4844 lengths = list(set(raw_lengths))
4845 if len(lengths) > 1:
-> 4846 raise ValueError('arrays must all be same length')
4847
4848 if have_dicts:
ValueError: arrays must all be same length
Yeah must be an inconsistency in sklearn coef_ attributes between some the models. I'm at the airport but comment line 59, the call to get_coef, of explore.py for a temporary workaround.
Ran 8 models and tried to look at the output using the read_models method but got an error. Models ran were - random forest, LDA, gaussian naive bayes, logistic regression. Perhaps it is because the models have different parameters.
the truncated error stack is attached below.