ucbrise / clipper

A low-latency prediction-serving system
http://clipper.ai
Apache License 2.0
1.41k stars 280 forks source link

[Integration Test] PySpark Redundancy #445

Open simon-mo opened 6 years ago

simon-mo commented 6 years ago

In PySpark Integration Test we have the following lines:

            version = 1
            lr_model = train_logistic_regression(trainRDD)
            deploy_and_test_model(
                sc,
                clipper_conn,
                lr_model,
                version,
                link_model=True,
                predict_fn=predict_with_local_modules)

            version += 1
            svm_model = train_svm(trainRDD)
            deploy_and_test_model(sc, clipper_conn, svm_model, version)

            version += 1
            rf_model = train_random_forest(trainRDD, 20, 16)
            deploy_and_test_model(sc, clipper_conn, svm_model, version)

There are two issues:

dcrankshaw commented 6 years ago

That is definitely a bug. The intent was to deploy the random forest model instead of the SVM model twice. The idea behind testing all 3 models is that they use different sub-components of Spark's ML interface, so this gives us slightly better test coverage. Do the Spark tests take a long time to run?