iamDecode / sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.
BSD 2-Clause "Simplified" License
76 stars 15 forks source link

Validate input array shape, and attempt to subscript dataframes #29

Closed iamDecode closed 3 years ago

iamDecode commented 3 years ago

This PR adds additional validation checks to ensure the data used for prediction matches the data expected by the PMML model (data fields). When a pandas dataframe is provided, this PR corrects for cases where more columns are provided than strictly necessary, and the order of columns mismatches. Explicit test cases for this were added. For numpy arrays, an exception is raised if the number of columns does not match the ones described in the PMML. Without any further information about which column is which, we cannot subscript automatically.

This commit comes with some internal refactoring to show more explicitly which estimators use one-hot-encoding and which use integer encoding for categorical variable support.