src-d / identity-matching

source{d} extension to match Git signatures to real people.
GNU General Public License v3.0
17 stars 13 forks source link

Save and load the bot detection model from modelforge #78

Closed warenlg closed 4 years ago

warenlg commented 4 years ago

Once the bot detection model has been trained and has reached good performance, we have to save it to asdf format and upload the model to modelforge. The corresponding script has been PR here https://github.com/src-d/identity-matching/pull/73. The tree includes:

warenlg commented 4 years ago

Issue 1.

First pushing the model to modelforge failed with the following error message:

pkg_resources.VersionConflict: (semantic-version 2.8.2 (/home/waren/.local/lib/python3.6/site-packages), Requirement.parse('semantic-version<=2.6.0,>=2.3.1'))

The issue comes from asdf and is labelled with high-priority https://github.com/spacetelescope/asdf/issues/702. A quick workaround is to downgrade semantic-version to version 2.6.0

warenlg commented 4 years ago

Issue 2.

After saving the model using the sklearn API behind XGBClassifier, if we load the model with load_model xgboost native function, we hit the following error when trying to predict() something: AttributeError: 'XGBClassifier' object has no attribute '_le'

The issue has already been raised here and the xgboost maintainers recommend to save the model using pickle for sklearn models objects like XGBClasssifier https://github.com/dmlc/xgboost/pull/3829

warenlg commented 4 years ago

Both issues above are overcome:

  1. by bumping asdf to 2.4.2 in modelforge https://github.com/src-d/modelforge/pull/107
  2. by adding the line xgb_cls._le = LabelEncoder().fit([False, True]) in the code snippet example usage in https://github.com/src-d/models/pull/28
vmarkovtsev commented 4 years ago

Shall we close then?

warenlg commented 4 years ago

It is waiting for https://github.com/src-d/models/pull/28 merging

warenlg commented 4 years ago

https://github.com/src-d/models/pull/28 is merged so we can close this.