sassoftware / python-sasctl

Python package and CLI for user-friendly integration with SAS Viya
https://sassoftware.github.io/python-sasctl
Apache License 2.0
45 stars 41 forks source link

PZMM Generated python score wrapper injects extra column failing python pickle predict call #81

Closed jameskochubasas closed 3 years ago

jameskochubasas commented 3 years ago

Describe the issue A clear and concise description of the issue you're experiencing. Using the Fleet example PZMM, I took the decisiontree model and found it fails to work. I debug manually testing the generate python score code and found: inputArray = pd.DataFrame([[1.0, Speed_sensor, Vibration, Engine_Load, Coolant_Temp, Intake_Pressure, Engine_RPM, Speed_OBD, Intake_Air, Flow_Rate, Throttle_Pos, Voltage, Ambient, Accel, Engine_Oil_Temp, Speed_GPS, GPS_Longitude, GPS_Latitude, GPS_Bearing, GPS_Altitude, Turbo_Boost, Trip_Distance, Litres_Per_km, Accel_Ssor_Total, CO2, Trip_Time, CO_emission, HC_emission, PM_emission, NOx_emission, CO2_emission, Fuel_level, Oil_life, Vibration_alert, VibrationAlert_Total, Vibration_Recent, Turbo_alert, Emission_alert, Fog_control, Engine_control]], columns = ['const', 'Speed_sensor', 'Vibration', 'Engine_Load', 'Coolant_Temp', 'Intake_Pressure', 'Engine_RPM', 'Speed_OBD', 'Intake_Air', 'Flow_Rate', 'Throttle_Pos', 'Voltage', 'Ambient', 'Accel', 'Engine_Oil_Temp', 'Speed_GPS', 'GPS_Longitude', 'GPS_Latitude', 'GPS_Bearing', 'GPS_Altitude', 'Turbo_Boost', 'Trip_Distance', 'Litres_Per_km', 'Accel_Ssor_Total', 'CO2', 'Trip_Time', 'CO_emission', 'HC_emission', 'PM_emission', 'NOx_emission', 'CO2_emission', 'Fuel_level', 'Oil_life', 'Vibration_alert', 'VibrationAlert_Total', 'Vibration_Recent', 'Turbo_alert', 'Emission_alert', 'Fog_control', 'Engine_control'], dtype = float)

"const" and "1.0" is an extra column that was never inputted into the pickled score code causing this to fail.

To Reproduce Steps or example code to reproduce the issue.

Expected behavior A clear and concise description of what you expected to happen. To fix, I have to manually remove the extra column and then python scoring was running.

Stack Trace If you're experiencing an exception, include the full stack trace and error message. Traceback (most recent call last): File "j.py", line 267, in scoreDecisionTreeClassifier(22,249.912052,20.784313,86,112, 1274.0,22,14,22.66,67.45098,14.16,8,23.529411,82,77.525595,8.350262,48.132993,317.3,322,2.17556,171.38008,7567968,-0.041867,200.17262,4748,0,0,0,0,0,0,0,1,123,12,1,1,1,1) File "j.py", line 255, in scoreDecisionTreeClassifier prediction = _thisModelFit.predict(inputArray) File "/python/lib/python3.6/site-packages/sklearn/tree/_classes.py", line 427, in predict X = self._validate_X_predict(X, check_input) File "/python/lib/python3.6/site-packages/sklearn/tree/_classes.py", line 399, in _validate_X_predict % (self.nfeatures, n_features)) ValueError: Number of features of the model must match the input. Model n_features is 39 and input n_features is 40

Version What version of sasctl are you using? installed sasctl-1.5.4

smlindauer commented 3 years ago

Hey @jameskochubasas,

The fix for this issue is still being tested locally. The next release should fix this issue. See further details of comments I've made on issue #71.

smlindauer commented 3 years ago

Duplicate of #71.

jameskochubasas commented 3 years ago

Note, this same issue happened on Viya 4.0. Verse 3.5. Btu we will see how that other item fixes this.