rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.21k stars 530 forks source link

[TRACKER] Improve CI build/test times #3026

Closed JohnZed closed 1 year ago

JohnZed commented 4 years ago

Short term:

Longer term:

dantegd commented 4 years ago

To aid on the time reduction effort, here are statistics I gathered from analyzing logs of the past couple of weeks for python tests.

Taking into account tests that take more than 1 ms to execute, the slowest 50 test files and their number of tests are:

    index cuml/cuml.dask                     test_file        total_time(s) number_of_tests  
0       0      cuml.dask    test_kneighbors_classifier         998           217  
1       1           cuml               test_make_blobs         773          1407  
2       2      cuml.dask     test_kneighbors_regressor         579           109  
3       3      cuml.dask                 test_datasets         565          3477  
4       4           cuml             test_linear_model         536           329  
5       5           cuml                   test_kmeans         388           309  
6       6      cuml.dask                     test_umap         372           258  
7       7           cuml                    test_array         334          1414  
8       8           cuml                     test_umap         331            50  
9       9           cuml            test_preprocessing         329           717  
10     10           cuml                      test_fil         206           102  
11     11           cuml        test_nearest_neighbors         193           740  
12     12           cuml          test_trustworthiness         191            36  
13     13           cuml          test_incremental_pca         184           134  
14     14           cuml                   test_pickle         158           111  
15     15           cuml    test_kneighbors_classifier         156           295  
16     16      cuml.dask       test_coordinate_descent         132           109  
17     17      cuml.dask        test_nearest_neighbors         121           105  
18     18           cuml            test_random_forest         118           178  
19     19           cuml                  test_metrics          93           247  
20     20           cuml                       test_qn          73           130  
21     21           cuml              test_input_utils          71           914  
22     22           cuml                      test_sgd          71           174  
23     23           cuml                test_benchmark          67            25  
24     24      cuml.dask            test_random_forest          65            37  
25     25           cuml                      test_pca          61            31  
26     26           cuml          test_one_hot_encoder          59            62  
27     27           cuml         test_mbsgd_classifier          56            22  
28     28           cuml                    test_arima          56            87  
29     29           cuml                     test_tsne          55            12  
30     30           cuml          test_mbsgd_regressor          53            21  
31     31           cuml     test_kneighbors_regressor          42            99  
32     32           cuml                test_allocator          37             3  
33     33           cuml       test_coordinate_descent          37            40  
34     34      cuml.dask                    test_tfidf          36           359  
35     35           cuml              test_naive_bayes          32            26  
36     36           cuml  test_text_feature_extraction          31            74  
37     37           cuml                      test_svm          27           133  
38     38      cuml.dask        test_linear_regression          23            48  
39     39      cuml.dask          test_one_hot_encoder          22            88  
40     40           cuml              test_holtwinters          21            41  
41     41      cuml.dask              test_input_utils          20           119  
42     42      cuml.dask                     test_base          20            39  
43     43           cuml                   test_dbscan          19            80  
44     44           cuml                    test_bench          18            10  
45     45      cuml.dask         test_ridge_regression          18            48  
46     46           cuml           test_target_encoder          15            17  
47     47      cuml.dask                      test_pca          15            12  
48     48      cuml.dask            test_label_encoder          13            75  
49     49      cuml.dask                     test_tsvd          13            13  

And the list of tests that take more than 10 seconds to run is:

time (s)   cuml/cuml.dask test_name                                                                                                    
70.163333  cuml           test_umap.py::test_umap_fit_transform_trust[blobs-categorical]                                            
53.593333  cuml.dask      test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-8-dask_array]                       
53.350000                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-3-dask_array]                      
53.340000                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-3-dask_array]                       
53.100000                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-1-dask_array]                       
52.980000                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-1-dask_array]                      
52.303333                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-8-dask_array]                      
51.936667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-3-dask_array]                        
51.243333                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-3-dask_array]                       
51.123333                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-1-dask_array]                        
50.896667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-8-dask_array]                       
50.870000                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-8-dask_array]                        
50.256667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-1-dask_array]                       
44.103333  cuml           test_umap.py::test_umap_fit_transform_trust[blobs-euclidean]                                              
41.613333  cuml.dask      test_random_forest.py::test_rf_regression_dask_cpu[5]                                                     
39.673333  cuml           test_fil.py::test_lightgbm[2]                                                                             
36.416667                 test_allocator.py::test_naive_bayes                                                                       
35.383333                 test_benchmark.py::test_real_algos_runner[FIL]                                                            
32.156667  cuml.dask      test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-1-dask_cudf]                        
31.473333                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-3-dask_cudf]                        
28.650000                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-12-8-dask_cudf]                        
28.543333  cuml           test_fil.py::test_fil_skl_classification[GradientBoostingClassifier-True-25-20-10-30-1000]                
28.133333  cuml.dask      test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-8-dask_cudf]                            
28.026667                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-3-dask_cudf]                       
27.846667                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-1-dask_cudf]                       
27.843333                 test_kneighbors_classifier.py::test_predict_and_score[dataset0-1024-12-8-dask_cudf]                       
27.786667  cuml           test_arima.py::test_integration[float64-key6-data6]                                                       
27.610000  cuml.dask      test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-1-dask_cudf]                           
27.483333                 test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-1-dask_cudf]                            
27.440000                 test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-3-dask_cudf]                            
27.236667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-3-dask_cudf]                         
27.206667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-8-dask_cudf]                         
27.146667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-3-dask_cudf]                        
27.136667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-1-dask_cudf]                        
26.923333                 test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-3-dask_cudf]                           
26.916667                 test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-8-dask_cudf]                           
26.780000                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-12-1-dask_cudf]                         
26.663333                 test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-3-dask_array]                           
26.573333                 test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-1-dask_array]                           
26.566667                 test_kneighbors_regressor.py::test_predict_and_score[dataset0-1024-12-8-dask_cudf]                        
26.100000                 test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-3-dask_array]                          
26.086667                 test_kneighbors_classifier.py::test_predict_proba[dataset0-128-12-8-dask_array]                           
25.840000                 test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-1-dask_array]                          
25.646667                 test_kneighbors_classifier.py::test_predict_proba[dataset0-1024-12-8-dask_array]                          
23.453333  cuml           test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box1-0.1-centers1-100-1000-single]   
22.760000                 test_incremental_pca.py::test_fit[5-csr-0.07-True-2-15-500]                                               
22.283333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box0-0.1-centers1-100-1000-single]       
22.243333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box0-0.01-centers1-100-1000-single]  
22.243333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box1-0.01-centers1-100-1000-single]  
22.156667                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box0-0.01-centers1-100-1000-single]      
21.910000                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box1-0.1-centers1-100-1000-single]    
21.876667                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box0-0.1-centers1-100-1000-single]   
21.680000                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[6-5-25-10000]                                      
21.453333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box1-0.1-centers1-100-1000-single]      
21.396667                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[4-5-25-10000]                                      
21.300000                 test_umap.py::test_umap_knn_parameters[15]                                                                
21.300000                 test_umap.py::test_umap_knn_parameters[5]                                                                 
21.183333                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[0-5-25-10000]                                      
21.176667                 test_umap.py::test_umap_fit_transform_trust[digits-euclidean]                                             
21.110000                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[2-5-25-10000]                                      
21.093333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box1-0.01-centers1-100-1000-single]   
20.920000                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[8-5-25-10000]                                      
20.853333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box0-0.1-centers1-100-1000-single]      
20.493333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box1-0.01-centers1-100-1000-single]      
20.340000                 test_benchmark.py::test_real_algos_runner[UMAP-Supervised]                                                
20.306667                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box0-0.01-centers1-100-1000-single]   
20.080000                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box0-0.01-centers1-100-1000-single]     
19.530000                 test_umap.py::test_umap_fit_transform_trust[digits-categorical]                                           
18.983333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box1-0.01-centers1-100-1000-single]     
18.973333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box1-0.1-centers1-100-1000-single]       
18.293333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box1-0.1-centers1-100-1000-double]    
18.140000                 test_pickle.py::test_regressor_pickle[False-data_size0-MBSGDClassifier-float64]                           
18.020000                 test_pickle.py::test_regressor_pickle[False-data_size0-MBSGDClassifier-float32]                           
17.796667  cuml.dask      test_nearest_neighbors.py::test_batch_size[100-10-10-1000]                                                
17.553333  cuml           test_pickle.py::test_regressor_pickle[True-data_size0-MBSGDClassifier-float64]                            
17.520000                 test_pickle.py::test_regressor_pickle[True-data_size0-MBSGDClassifier-float32]                            
17.330000                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box1-0.01-centers1-100-1000-double]  
17.316667                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box0-0.1-centers1-100-1000-double]   
16.946667                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box0-0.1-centers1-100-1000-double]       
16.800000                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box1-0.01-centers1-100-1000-double]      
16.736667                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box1-0.1-centers1-100-1000-double]       
16.663333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box1-0.1-centers1-100-1000-double]   
16.603333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box1-0.01-centers1-100-1000-double]   
16.466667                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box0-0.1-centers1-100-1000-single]    
16.356667                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box1-0.01-centers1-100-1000-double]     
16.240000                 test_linear_model.py::test_logistic_regression_unscaled[l2-float32]                                       
16.210000                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box0-0.1-centers1-100-1000-double]      
16.173333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-False-center_box0-0.01-centers1-100-1000-double]  
16.153333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-True-center_box0-0.01-centers1-100-1000-double]      
15.846667                 test_linear_model.py::test_logistic_regression_unscaled[none-float32]                                     
15.563333                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box0-0.01-centers1-100-1000-double]     
15.490000                 test_make_blobs.py::test_make_blobs_ary_parameters[9-False-center_box1-0.1-centers1-100-1000-double]      
15.403333                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box0-0.1-centers1-100-1000-double]    
14.810000                 test_make_blobs.py::test_make_blobs_ary_parameters[None-True-center_box0-0.01-centers1-100-1000-double]   
14.630000                 test_tsne.py::test_tsne[digits]                                                                           
14.613333                 test_random_forest.py::test_rf_regression_float64[large_reg0-datatype0]                                   
14.466667                 test_fil.py::test_fil_skl_classification[GradientBoostingClassifier-True-25-10-10-30-1000]                
14.423333                 test_kmeans.py::test_weighted_kmeans[0-10-10-25-500]                                                      
14.356667                 test_kmeans.py::test_weighted_kmeans[1-10-10-25-500]                                                      
14.183333                 test_kmeans.py::test_weighted_kmeans[4-10-10-25-500]                                                      
14.120000                 test_kmeans.py::test_weighted_kmeans[2-10-10-25-500]                                                      
14.066667                 test_kmeans.py::test_weighted_kmeans[3-10-10-25-500]                                                      
13.963333                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[8-2-25-10000]                                      
13.753333                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[6-2-25-10000]                                      
13.603333                 test_fil.py::test_fil_skl_classification[GradientBoostingClassifier-False-25-10-10-30-1000]               
13.310000                 test_arima.py::test_integration[float64-key7-data7]                                                       
13.273333                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[2-2-25-10000]                                      
13.156667                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[0-2-25-10000]                                      
12.673333                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[4-2-25-10000]                                      
12.596667                 stemmer_tests/test_stemmer.py::test_same_results                                                          
12.563333                 test_umap.py::test_umap_fit_transform_score[10-500]                                                       
12.163333                 test_umap.py::test_umap_fit_transform_reproducibility[random_state2-50]                                   
12.086667  cuml.dask      test_random_forest.py::test_rf_classification_multi_class[3]                                              
12.036667  cuml           test_trustworthiness.py::test_trustworthiness[8-100-512-500-dataframe]                                    
11.820000  cuml.dask      test_kneighbors_regressor.py::test_predict_and_score[dataset0-128-2-1-dask_array]                         
11.633333  cuml           test_umap.py::test_umap_fit_transform_reproducibility[8-50]                                               
10.816667                 test_trustworthiness.py::test_trustworthiness[8-100-2-500-dataframe]                                      
10.803333                 test_trustworthiness.py::test_trustworthiness[8-10-2-500-dataframe]                                       
10.733333                 test_trustworthiness.py::test_trustworthiness[8-100-512-500-ndarray]                                      
10.630000  cuml.dask      test_coordinate_descent.py::test_elastic_net[True-16-column_info0-500-cyclic-0.2-float32]                 
10.603333  cuml           test_pca.py::test_sparse_pca_inputs[False-True-10000-8000]                                                
10.600000                 test_pca.py::test_sparse_pca_inputs[False-False-10000-8000]                                               
10.546667                 test_linear_model.py::test_logistic_regression_predict_proba[True-10-column_info0-10-float32]             
10.303333  cuml.dask      test_kneighbors_classifier.py::test_predict_and_score[dataset0-128-2-1-dask_array]                        
10.216667  cuml           test_trustworthiness.py::test_trustworthiness[8-10-512-500-dataframe]                                     
10.200000                 test_naive_bayes.py::test_basic_fit_predict_sparse[int32-float32]                                         
10.136667                 test_kmeans.py::test_traditional_kmeans_plus_plus_init[4-5-25-1000]                                       
10.110000                 test_random_forest.py::test_rf_regression_float64[large_reg0-datatype1]                                   
10.053333                 test_trustworthiness.py::test_trustworthiness[8-10-2-500-ndarray]                                         
dantegd commented 4 years ago

@JohnZed why did you link PR #2988 in the description of the issue? That's a FIL change that seems totally unrelated to build times?

JohnZed commented 4 years ago

@dantegd Fixed! It should be 2998 not 2988

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.