sassoftware / python-sasctl

Python package and CLI for user-friendly integration with SAS Viya
https://sassoftware.github.io/python-sasctl
Apache License 2.0
45 stars 40 forks source link

dtypes for the scoring function #174

Open ryanma9629 opened 1 year ago

ryanma9629 commented 1 year ago

When generating the Python scoring function in MM, the default dtypes are set to 'object', as below: input_array = pd.DataFrame([[LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC]], columns=["LOAN", "MORTDUE", "VALUE", "REASON", "JOB", "YOJ", "DEROG", "DELINQ", "CLAGE", "NINQ", "CLNO", "DEBTINC"], dtype=object) However, classifiers such as lightgbm don't accept object dtypes. So we may get an error when scoring with lightgbm models in MM: ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in the following fields: LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC I don't know whether it is safe to set all dtypes to float or None when generating the scoring func.

smlindauer commented 1 year ago

@ryanma9629: I was running in to a depreciation error with pandas in regards to setting all the values to float, but it may be better to just let pandas dictate the type. The worry I had had was that MM can't accept numpy values, but we can check for that from the output of the prediction function.

I will run through some other model types to see how they handle not setting the dtype in the input_array.

ryanma9629 commented 1 year ago

due to the depreciation of np.int and np.float since numpy version 1.20? https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

@ryanma9629: I was running in to a depreciation error with pandas in regards to setting all the values to float, but it may be better to just let pandas dictate the type. The worry I had had was that MM can't accept numpy values, but we can check for that from the output of the prediction function.

I will run through some other model types to see how they handle not setting the dtype in the input_array.