mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Use model_selection instead of deprecated cross_validation #6

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Presently, when training a model, we see the following deprecation message.

$ python training/datatrain.py
/Users/demo/.virtualenvs/gabbar/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
training samples: 12364
[testing] good samples: 5299
[testing] problematic samples: 671
precision = 0.915625
recall = 0.442348
f1_score = 0.596514
bkowshik commented 7 years ago

This deprecation message is being logged in osmcha-django where it is currently being used which is making the logs larger and cluttered to look at. We might need a fix for this sooner.


cc: @rodowi

defvol commented 7 years ago

which is making the logs larger and cluttered to look at

🤔 osmcha shouldn't be running datatrain.py

bkowshik commented 7 years ago

🤔 osmcha shouldn't be running datatrain.py

Yeah. I assumed that the logs I saw was ^, but in-fact when I looked at this again today, it was different. The following is the deprecation message that we are getting.

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
gabbar prediction: -1.0

Ex:

>>> x = np.asarray(range(3))
>>> x
array([0, 1, 2])
>>> x.reshape(-1, 1)
array([[0],
       [1],
       [2]])

I will work on a fix soon.

bkowshik commented 7 years ago

Not seeing this presently. Will revisit again later if required.