Import the preprocessing process

We do not currently have a functional process built in to pzmm for including a preprocessing process inside a pickle file or as additional code within the generated scoring script (although simple imputation is supported through the missing_values argument of the pzmm.ImportModel.import_model() and pzmm.ScoreCode.write_score_code() functions).

Currently, implementation of additional preprocessing this would require modification of the score code that is generated by sasctl and uploaded to SAS Model Manager. On SAS Viya 4, you can to utilize a few different sasctl functions to pull this off (example below), but requires a bit more work in SAS Viya 3.5. This is due to the different behaviors in regard to the creation of DS2 wrapper code:

In SAS Viya 4, the wrapper code is regenerated by SAS Model Manager each time the model is published or a score test is attempted. This allows for modification of the Python model assets without worrying about modifying DS2 code.
In SAS Viya 3.5, the wrapper code is generated by an API call and then further modified by sasctl to allow for scoring in either MAS or CAS. Modifications to the score code or other Python assets would require additional calls to this API as well as rerunning some of the logic found in pzmm.write_score_code.py. (Further example clarification below.)

Assuming you are providing an additional pickle file that encapsulates the data preprocessing, you will need to upload the new pickle object and adjust the score code. For SAS Viya 4, after running the pzmm.ImportModel.import_model() function to register the model in SAS Model Manager, this would look like:

# Assuming the preprocessing pickle file and original score code are on disk

from pathlib import Path

from sasctl import Session
from sasctl._services.model_repository import ModelRepository as mr

# Create a session to a SAS Viya server
sess = Session("demo.sas.com", "username", "password", protocol="http")

# Visualize API calls
sess.add_stderr_logger(level=20)

# Collect the model to be modified
model_name = "preprocess_model"
project_name = "preprocess_project"
model = mr.list_models(filter=f"and(eq(projectName,'{project_name}'),"
                              f"eq(name,'{model_name}'))")[0]

# Read in the score code and modify in Python (or modify the score code manually)
with open(Path.cwd() / "path/to/score_preprocess_model.py") as score_file:
    score_code = score_file.readlines()

# Modify the score code to preprocess the input_array inside the score function
for index, line in enumerate(score_code):
    if f"{'':8}with open(" in line:
        score_code[index] = f"{'':8}with open(Path(settings.pickle_path) / " \
                            f"\"preprocess.pickle\", \"rb\") as preprocess_file:\n" \
                            f"{'':12}preprocess = pickle.load(preprocess_file)\n" \
                            + score_code[index]
    elif f"with open(" in line:
        score_code[
            index] = f"with open(Path(settings.pickle_path) / \"preprocess." \
                     f"pickle\", \"rb\") as preprocess_file:\n{'':4}preprocess" \
                     f" = pickle.load(preprocess_file)\n" + score_code[index]
    elif "prediction = " in line:
        score_code[index] = f"{'':4}input_array = preprocess(input_array)\n" \
                            + score_code[index]

# Return score code file to a single string form for uploading
score_code = "".join(score_code)

with open(Path.cwd() / "path/to/preprocess.pickle", "rb") as preprocess_file:
    files = [
        {
            "name": "preprocess.pickle",
            "file": preprocess_file,
            "role": "scoreResource"
        },
        {
            "name": "score_model_preprocess.py",
            "file": score_code,
            "role": "score"
        }
    ]
    for file in files:
        mr.add_model_content(model, **file)

For SAS Viya 3.5, you would need upload the new files like above, then delete the *.sas files present in the model assets on SAS Model Manager, and then convert the model and score code to appropriate formats, This would look like the following, assuming the model variable is the RestObj representation of the model and the new score and preprocessing pickle file have already been uploaded:

from sasctl.core import delete
from sasctl._services.model_repository import ModelRepository as mr
from sasctl.pzmm.write_score_code import ScoreCode as sc

# Get the file list and delete all *.sas files
file_list = mr.get_model_contents(mr.get_model(model_name))
file_uri = [mr.get_link(file, "delete")["uri"] for file in file_list if ".sas" in file.name]
[delete(uri) for uri in file_uri]

# Convert the model score code to CAS and MAS focused scripts and convert the model type as needed
model["scoreCodeType"] = "Python"
model = mr.update_model(model)
mr.convert_python_to_ds2(model)
model_contents = mr.get_model_contents(model)
for file in model_contents:
    if file.name == "score.sas":
        mas_code = mr.get(f"models/{file.modelId}/contents/{file.id}/content")
        sc.upload_and_copy_score_resources(model, [{"name": MAS_CODE_NAME, "file": mas_code, "role": "score"}])
        cas_code = sc.convert_mas_to_cas(mas_code, model)
        sc.upload_and_copy_score_resources(model, [{"name": CAS_CODE_NAME, "file": cas_code, "role": "score"}])
        model["scoreCodeType"] = "ds2MultiType"
        mr.update_model(model)
        break

Feel free to submit code to implement this method in a more defined manner. Otherwise, we can add this as an enhancement request for future releases.

sassoftware / python-sasctl

Import the preprocessing process #171