This PR adds additional validation checks to ensure the data used for prediction matches the data expected by the PMML model (data fields). When a pandas dataframe is provided, this PR corrects for cases where more columns are provided than strictly necessary, and the order of columns mismatches. Explicit test cases for this were added. For numpy arrays, an exception is raised if the number of columns does not match the ones described in the PMML. Without any further information about which column is which, we cannot subscript automatically.
This commit comes with some internal refactoring to show more explicitly which estimators use one-hot-encoding and which use integer encoding for categorical variable support.
This PR adds additional validation checks to ensure the data used for prediction matches the data expected by the PMML model (data fields). When a pandas dataframe is provided, this PR corrects for cases where more columns are provided than strictly necessary, and the order of columns mismatches. Explicit test cases for this were added. For numpy arrays, an exception is raised if the number of columns does not match the ones described in the PMML. Without any further information about which column is which, we cannot subscript automatically.
This commit comes with some internal refactoring to show more explicitly which estimators use one-hot-encoding and which use integer encoding for categorical variable support.