lacava / few

a feature engineering wrapper for sklearn
https://lacava.github.io/few
GNU General Public License v3.0
50 stars 22 forks source link

Revert "standard scaler pipeline" #28

Closed lacava closed 7 years ago

lacava commented 7 years ago

Reverts lacava/few#26

this passes the tests but is throwing an error on the dataset i'm currently applying to. the error comes in the fit method, line 314 in few.py:

$python -m few.few ../../data/maize/d_maize-dent-tass.csv -p 100 -max_depth 3 -ms 25 --weight_parents 

warning: ValueError in ml fit. X.shape: (100, 100) y_t shape: (146,)
First ten entries X: [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
   1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]
First ten entries y_t: [ 826.47  884.33  904.55  848.71  879.46  885.12  905.36  886.69  821.05
  912.51]
equations: ['x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_16313', 'x_16313', 'x_16313', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55', 'x_16313', 'x_55', 'x_55', 'x_55', 'x_55', 'x_55']
FEW parameters: {'min_depth': 1, 'ml': None, 'elitism': True, 'clean': False, 'erc': False, 'classification': False, 'crossover_rate': 0.5, 'c': True, 'op_weight': 1, 'scoring_function': None, 'population_size': '100', 'mdr': False, 'weight_parents': True, 'max_depth': 3, 'tourn_size': 2, 'mutation_rate': 0.5, 'max_depth_init': 2, 'random_state': None, 'max_stall': 25, 'otype': 'f', 'generations': 100, 'sel': 'epsilon_lexicase', 'verbosity': 1, 'track_diversity': False, 'seed_with_ml': True, 'disable_update_check': False, 'fit_choice': None, 'boolean': False}
Traceback (most recent call last):
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 314, in fit
    self.ml.fit(self.X[self.valid_loc(),:].transpose(),y_t)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/pipeline.py", line 270, in fit
    self._final_estimator.fit(Xt, y, **fit_params)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 1141, in fit
    axis=0)(all_alphas)
  File "/home/bill/anaconda3/lib/python3.5/site-packages/scipy/interpolate/interpolate.py", line 483, in __init__
    "least %d entries" % minval)
ValueError: x and y arrays must have at least 2 entries

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 906, in <module>
    main()
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 893, in main
    learner.fit(training_features, training_labels)
  File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 330, in fit
    raise(ValueError)
ValueError
coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.3%) to 73.387% when pulling 3b6038100b40aa5f18962deae01d0f8528e4b642 on revert-26-standard_scaler into aa214bbbd15517dc3663f559449c1bd0774efd7d on master.

lacava commented 7 years ago

it seems like the problem is related to an open issue in sklearn where some estimators (lasso in this case) err when features have std=0. in this dataset, some of the features are constant.

for other datasets, the pipeline works, so i'm going to remerge it.