[Integration Test] PySpark Redundancy

ucbrise / clipper

A low-latency prediction-serving system

Apache License 2.0

1.41k stars 280 forks source link

In PySpark Integration Test we have the following lines:

            version = 1
            lr_model = train_logistic_regression(trainRDD)
            deploy_and_test_model(
                sc,
                clipper_conn,
                lr_model,
                version,
                link_model=True,
                predict_fn=predict_with_local_modules)

            version += 1
            svm_model = train_svm(trainRDD)
            deploy_and_test_model(sc, clipper_conn, svm_model, version)

            version += 1
            rf_model = train_random_forest(trainRDD, 20, 16)
            deploy_and_test_model(sc, clipper_conn, svm_model, version)

There are two issues:

Three models are trained. Do we need to train 3 models and test all three?
The random forest model is not deployed. The svm is deployed twice.

ucbrise / clipper

[Integration Test] PySpark Redundancy #445