hanging on some tests, most often on ppc.

pseudotensor commented 5 years ago

http://mr-0xc1:8080/job/h2o4gpu-ppc64le-cuda9/job/master/148/consoleFull

[jenkins@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$ top
top - 21:34:03 up 10 days,  2:17,  2 users,  load average: 4.05, 6.02, 8.34
Tasks: 1187 total,   5 running, 623 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  2.9 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32985715+total, 14927987+free, 15080960 used, 16549632+buff/cache
KiB Swap:  1540032 total,  1540032 free,        0 used. 30604032+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                          
 93779 root      20   0   20.6g 974080 323904 R 102.7  0.3 742:15.78 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 93769 root      20   0   20.9g   1.2g 318208 R 101.4  0.4  91:30.81 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 93775 root      20   0   20.6g   1.0g 324032 R 101.4  0.3 515:42.42 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 93783 root      20   0   11.6g 882880 323392 R 101.4  0.3  91:03.78 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
  6927 jenkins   20   0  120128   9856   4608 R   1.4  0.0   0:00.05 top                                                                                                                              
     1 root      20   0  168832  20416   5248 S   0.0  0.0   1:51.46 /usr/lib/systemd/systemd --switched-root --system --deserialize 22                                                               
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.68 [kthreadd]                                                                                                                       
     4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 [kworker/0:0H]                                                                                                                   
     6 root      20   0       0      0      0 I   0.0  0.0   0:18.38 [kworker/u256:0]                                                                                                                 
     7 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 [mm_percpu_wq]                                                                                                                   
     8 root      20   0       0      0      0 S   0.0  0.0   0:12.92 [ksoftirqd/0]                                                                                                                    
     9 root      20   0       0      0      0 I   0.0  0.0   4:29.73 [rcu_sched]                                                                                                                      
    10 root      20   0       0      0      0 I   0.0  0.0   0:00.00 [rcu_bh]                                                                                                                         
    11 root      rt   0       0      0      0 S   0.0  0.0   0:00.27 [migration/0]                                                                                                                    
    12 root      rt   0       0      0      0 S   0.0  0.0   0:00.69 [watchdog/0]                                                                                                                     
    13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [cpuhp/0]                                                                                                                        
    14 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [cpuhp/1]                                                                                                                        
    15 root      rt   0       0      0      0 S   0.0  0.0   0:00.68 [watchdog/1]                                                                                                                     
    16 root      rt   0       0      0      0 S   0.0  0.0   0:00.24 [migration/1]                                                                                                                    
    17 root      20   0       0      0      0 S   0.0  0.0   0:07.11 [ksoftirqd/1]                                                                                                                    
    19 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 [kworker/1:0H]                                                                                                                   
[jenkins@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$ pwdx 93779
93779: Permission denied
[jenkins@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$ nvidia-smi
Thu May 30 21:34:33 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   43C    P0    55W / 300W |   2223MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000035:03:00.0 Off |                    0 |
| N/A   44C    P0    54W / 300W |   5685MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     93769      C   /opt/h2oai/h2o4gpu/python/bin/python         453MiB |
|    0     93772      C   /opt/h2oai/h2o4gpu/python/bin/python         413MiB |
|    0     93775      C   /opt/h2oai/h2o4gpu/python/bin/python         449MiB |
|    0     93779      C   /opt/h2oai/h2o4gpu/python/bin/python         449MiB |
|    0     93783      C   /opt/h2oai/h2o4gpu/python/bin/python         449MiB |
|    1     93769      C   /opt/h2oai/h2o4gpu/python/bin/python         405MiB |
|    1     93772      C   /opt/h2oai/h2o4gpu/python/bin/python        4067MiB |
|    1     93775      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     93779      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     93783      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
+-----------------------------------------------------------------------------+
[jenkins@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$ top

```

top - 21:34:42 up 10 days, 2:18, 2 users, load average: 4.08, 5.78, 8.16 Tasks: 1180 total, 5 running, 622 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 2.9 sy, 0.0 ni, 96.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32985715+total, 14928896+free, 15071168 used, 16549702+buff/cache KiB Swap: 1540032 total, 1540032 free, 0 used. 30604953+avail Mem

PID USER 93775 root 93779 root 93783 root 93769 root 6987 jenkins 1 root 2 root 20 0 4 root 0 -20 6 root 20 0 7 root 0 -20 8 root 20 0 9 root 20 0 10 root 20 0 11 root rt 0 12 root rt 0 13 root 20 0 14 root 20 0 15 root rt 0 16 root rt 0 17 root 20 0 19 root 0 -20 20 root 20 0 21 root rt 0 22 root rt 0 23 root 20 0 25 root 0 -20 26 root 20 0 27 root rt 0 28 root rt 0 [jenkins@mr-0xp3 Thu May 30 21:34:45 2019
+------------------ | NVIDIA-SMI 418.67 |------------------ | GPU Name | Fan Temp |================== | 0 Tesla V100-SXM2... On | N/A 43C +------------------ | 1 Tesla V100-SXM2... On | N/A 44C +------------------ PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20 0 20.6g 1.0g 324032 R 100.8 0.3 516:21.98 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
20 0 20.6g 974080 323904 R 100.8 0.3 742:55.33 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
20 0 11.6g 882880 323392 R 100.8 0.3 91:43.33 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
20 0 20.9g 1.2g 318208 R 100.0 0.4 92:10.36 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
20 0 120128 9856 4608 R 1.7 0.0 0:00.05 top
20 0 168832 20416 5248 S 0.0 0.0 1:51.46 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
0 0 0 S 0.0 0.0 0:00.68 [kthreadd]
0 0 0 I 0.0 0.0 0:00.00 [kworker/0:0H]
0 0 0 I 0.0 0.0 0:18.38 [kworker/u256:0]
0 0 0 I 0.0 0.0 0:00.00 [mm_percpu_wq]
0 0 0 S 0.0 0.0 0:12.92 [ksoftirqd/0]
0 0 0 I 0.0 0.0 4:29.74 [rcu_sched]
0 0 0 I 0.0 0.0 0:00.00 [rcu_bh]
0 0 0 S 0.0 0.0 0:00.27 [migration/0]
0 0 0 S 0.0 0.0 0:00.69 [watchdog/0]
0 0 0 S 0.0 0.0 0:00.00 [cpuhp/0]
0 0 0 S 0.0 0.0 0:00.00 [cpuhp/1]
0 0 0 S 0.0 0.0 0:00.68 [watchdog/1]
0 0 0 S 0.0 0.0 0:00.24 [migration/1]
0 0 0 S 0.0 0.0 0:07.11 [ksoftirqd/1]
0 0 0 I 0.0 0.0 0:00.00 [kworker/1:0H]
0 0 0 S 0.0 0.0 0:00.00 [cpuhp/2]
0 0 0 S 0.0 0.0 0:00.68 [watchdog/2]
0 0 0 S 0.0 0.0 0:00.21 [migration/2]
0 0 0 S 0.0 0.0 0:06.48 [ksoftirqd/2]
0 0 0 I 0.0 0.0 0:00.00 [kworker/2:0H]
0 0 0 S 0.0 0.0 0:00.00 [cpuhp/3]
0 0 0 S 0.0 0.0 0:00.68 [watchdog/3]
0 0 0 S 0.0 0.0 0:00.19 [migration/3]
2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$ nvidia-smi -----------------------------------------------------------+ Driver Version: 418.67 CUDA Version: 10.1 | -------------+----------------------+----------------------+ Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | =============+======================+======================| | 00000004:04:00.0 Off | 0 | P0 55W / 300W | 2223MiB / 32480MiB | 0% Default | -------------+----------------------+----------------------+ | 00000035:03:00.0 Off | 0 | P0 54W / 300W | 5685MiB / 32480MiB | 0% Default | -------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 93769 C /opt/h2oai/h2o4gpu/python/bin/python 453MiB | | 0 93772 C /opt/h2oai/h2o4gpu/python/bin/python 413MiB | | 0 93775 C /opt/h2oai/h2o4gpu/python/bin/python 449MiB | | 0 93779 C /opt/h2oai/h2o4gpu/python/bin/python 449MiB | | 0 93783 C /opt/h2oai/h2o4gpu/python/bin/python 449MiB | | 1 93769 C /opt/h2oai/h2o4gpu/python/bin/python 405MiB | | 1 93772 C /opt/h2oai/h2o4gpu/python/bin/python 4067MiB | | 1 93775 C /opt/h2oai/h2o4gpu/python/bin/python 401MiB | | 1 93779 C /opt/h2oai/h2o4gpu/python/bin/python 401MiB | | 1 93783 C /opt/h2oai/h2o4gpu/python/bin/python 401MiB | +-----------------------------------------------------------------------------+ [jenkins@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]$

pseudotensor commented 5 years ago

mostly single-threaded CPU activity, even though stuff sits on the GPU.

Also was here too: http://mr-0xc1:8080/job/h2o4gpu-ppc64le-cuda9/job/master/146/console

pseudotensor commented 5 years ago

Point is it never resolves. Should only take 40 mins, but limit is up to 4 hours for test suite overall, and just hangs there that long.

pseudotensor commented 5 years ago

[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# ps -auxwf | grep python
root       7261  0.0  0.0 445184 34944 ?        Ssl  May20   0:29 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
root      10563  0.0  0.0 110848  2816 pts/3    S+   22:02   0:00  |                   \_ grep --color=auto python
root      93759  0.0  0.0   6464  4160 ?        S    20:02   0:00                  \_ /bin/sh -c pytest -s --verbose --durations=10 --numprocesses 5 --fulltrace --full-trace --junit-xml=build/test-reports/h2o4gpu-test.xml tests/python/open_data 2> ./tmp/h2o4gpu-test.22796_2019.05.30-20:02:46.log
root      93760  0.0  0.0 412736 40704 ?        Sl   20:02   0:01                      \_ /opt/h2oai/h2o4gpu/python/bin/python /opt/h2oai/h2o4gpu/python/bin/pytest -s --verbose --durations=10 --numprocesses 5 --fulltrace --full-trace --junit-xml=build/test-reports/h2o4gpu-test.xml tests/python/open_data
root      93769 98.2  0.3 21900672 1235264 ?    Rl   20:02 117:48                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93815  0.0  0.0  16640 14016 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93772 13.8  0.3 24183744 1104000 ?    Sl   20:02  16:38                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93820  0.0  0.0  16640 14016 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93775  452  0.3 21647744 1083264 ?    Rl   20:02 542:46                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93821  0.0  0.0  16640 14080 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93779  642  0.2 21602752 974080 ?     Rl   20:02 770:37                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93822  0.0  0.0  16640 14016 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93783 99.8  0.2 12191680 882880 ?     Rl   20:02 119:42                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93823  0.0  0.0  16640 14080 ?        S    20:02   0:00                              \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)

pseudotensor commented 5 years ago

[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# ps -auxwf | grep python
root       7261  0.0  0.0 445184 34944 ?        Ssl  May20   0:29 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
root      11047  0.0  0.0 110848  2880 pts/3    S+   22:06   0:00  |                   \_ grep --color=auto python
root      93759  0.0  0.0   6464  4160 ?        S    20:02   0:00                  \_ /bin/sh -c pytest -s --verbose --durations=10 --numprocesses 5 --fulltrace --full-trace --junit-xml=build/test-reports/h2o4gpu-test.xml tests/python/open_data 2> ./tmp/h2o4gpu-test.22796_2019.05.30-20:02:46.log
root      93760  0.0  0.0 412736 40704 ?        Sl   20:02   0:01                      \_ /opt/h2oai/h2o4gpu/python/bin/python /opt/h2oai/h2o4gpu/python/bin/pytest -s --verbose --durations=10 --numprocesses 5 --fulltrace --full-trace --junit-xml=build/test-reports/h2o4gpu-test.xml tests/python/open_data
root      93769 98.3  0.3 21900672 1235264 ?    Rl   20:02 121:37                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93815  0.0  0.0      0     0 ?        Z    20:02   0:00                          |   \_ [python] <defunct>
root      93772 13.4  0.3 24183744 1104000 ?    Sl   20:02  16:38                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93820  0.0  0.0  16640 14016 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93775  441  0.3 21647744 1083264 ?    Rl   20:02 546:35                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93821  0.0  0.0  16640 14080 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93779  625  0.2 21602752 974080 ?     Rl   20:02 774:26                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93822  0.0  0.0  16640 14016 ?        S    20:02   0:00                          |   \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
root      93783 99.8  0.2 12191680 882880 ?     Rl   20:02 123:31                          \_ /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))
root      93823  0.0  0.0  16640 14080 ?        S    20:02   0:00                              \_ /opt/h2oai/h2o4gpu/python/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
jorge     25489  0.0  0.1 20978112 374592 ?     Sl   May24   0:00 __python /home/jorge/scoring-pipeline/example.pystacked_transform-make_holdout_predsmake_holdout_preds_subprocess-running_XGBoostModel
jorge     50348  0.0  0.1 20978112 374592 ?     Sl   May24   0:00 __python /home/jorge/scoring-pipeline/example.pystacked_transform-make_holdout_predsmake_holdout_preds_subprocess-running_XGBoostModel
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# kill -s 9 93769
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# kill -s 9 93772
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# kill -s 9 93775
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# kill -s 9 93783
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]# kill -s 9 93779
[root@mr-0xp3 2o4gpu-ppc64le-cuda9_master-ESA3O5KYICOMNJZ2XIJAVONAQEUPFXSUWARE7V4KDS6YGKSOMSQA]#

pseudotensor commented 5 years ago

actually, I figured out way 3:07 PM I can kill the pytest worker, and see who died in the console jenkins logs 3:07 PM 15:06:49 [gw0] FAILED tests/python/open_data/gbm/test_gpu_prediction_pickledmodel.py::TestGPUPredict::test_predict_sklearn_pickle 3:07 PM that's one hanging guy 3:07 PM http://mr-0xc1:8080/job/h2o4gpu-ppc64le-cuda9/job/master/148/console 3:07 PM will kill next one 3:08 PM [gw1] FAILED tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_sklearn_gbm_classification 3:08 PM the other 3:08 PM FAILED tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_sklearn_gbm_regression 3:08 PM FAILED tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_gbm_classifier_backupsklearn 3:09 PM FAILED tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_gbm_regressor_backupsklearn 3:09 PM Those were the procs of the pytest workers who were hanging 3:09 PM I killed each one one at a time

pseudotensor commented 5 years ago

(from observing reaction at: http://mr-0xc1:8080/job/h2o4gpu-ppc64le-cuda9/job/master/148/console)

pseudotensor commented 5 years ago

still hung soon after:

story
jon@pseudotensor:~/h2oai$ ssh jenkins@mr-0xp3
jenkins@mr-0xp3's password: 
Last login: Thu May 30 21:32:20 2019 from 172.17.0.219
======================================================
  IBM Service and Productivty Tools for Linux on Power

   IBM value-added software for Linux on Power servers
   is available to be installed.
   You can set up IBM remote repositories at any time
   by running as root:

   # /opt/ibm/lop/configure

======================================================

[jenkins@mr-0xp3 ~]$ top
top - 00:14:50 up 10 days,  4:58,  4 users,  load average: 10.17, 11.98, 29.47
Tasks: 1241 total,  11 running, 672 sleeping,   0 stopped,   5 zombie
%Cpu(s):  0.6 us,  7.2 sy,  0.0 ni, 92.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32985715+total, 12272166+free, 34489216 used, 17264627+buff/cache
KiB Swap:  1540032 total,  1540032 free,        0 used. 28462368+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                          
 89948 root      20   0   20.7g 966080 336960 R 100.0  0.3 707:28.18 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 11385 root      20   0   11.7g 876096 314944 R  99.7  0.3 126:02.88 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 11549 root      20   0   11.4g 598336 269952 R  99.7  0.2 125:24.76 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 47446 root      20   0   21.0g   1.2g 331264 R  99.7  0.4  70:18.64 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 47452 root      20   0   20.8g   1.0g 337088 R  99.7  0.3 462:46.87 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 47456 root      20   0   20.7g 965632 336960 R  99.7  0.3 723:17.35 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 47460 root      20   0   11.8g 917568 336448 R  99.7  0.3  69:50.42 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 89938 root      20   0   20.7g 935040 333056 R  99.7  0.3  32:36.87 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 89944 root      20   0   20.8g   1.0g 337088 R  99.7  0.3 445:13.05 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 89952 root      20   0   11.8g 917888 336448 R  99.7  0.3  32:03.30 /opt/h2oai/h2o4gpu/python/bin/python -u -c import sys;exec(eval(sys.stdin.readline()))                                           
 96201 jenkins   20   0  120320   9984   4608 R   1.3  0.0   0:00.12 top                                                                                                                              
 57853 jenkins   20   0   22.3g 654784  36288 S   0.3  0.2 398:27.78 /usr/bin/java -jar remoting.jar -workDir /home/jenkins/slave_dir_from_mr-0xc1                                                    
     1 root      20   0  168832  20416   5248 S   0.0  0.0   1:53.25 /usr/lib/systemd/systemd --switched-root --system --deserialize 22                                                               
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.69 [kthreadd]                                                                                                                       
     4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 [kworker/0:0H]                                                                                                                   
     6 root      20   0       0      0      0 I   0.0  0.0   0:18.47 [kworker/u256:0]                                                                                                                 
     7 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 [mm_percpu_wq]                                                                                                                   
     8 root      20   0       0      0      0 S   0.0  0.0   0:13.02 [ksoftirqd/0]                                                                                                                    
     9 root      20   0       0      0      0 I   0.0  0.0   4:33.28 [rcu_sched]                                                                                                                      
    10 root      20   0       0      0      0 I   0.0  0.0   0:00.00 [rcu_bh]                                                                                                                         
    11 root      rt   0       0      0      0 S   0.0  0.0   0:00.27 [migration/0]                                                                                                                    
[jenkins@mr-0xp3 ~]$ nvidia-smi
Fri May 31 00:14:54 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   44C    P0    55W / 300W |   6641MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000035:03:00.0 Off |                    0 |
| N/A   45C    P0    54W / 300W |  13355MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     11063      C   /opt/h2oai/h2o4gpu/python/bin/python         365MiB |
|    0     11187      C   /opt/h2oai/h2o4gpu/python/bin/python         411MiB |
|    0     11251      C   /opt/h2oai/h2o4gpu/python/bin/python         359MiB |
|    0     11385      C   /opt/h2oai/h2o4gpu/python/bin/python         449MiB |
|    0     11549      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    0     47446      C   /opt/h2oai/h2o4gpu/python/bin/python         475MiB |
|    0     47449      C   /opt/h2oai/h2o4gpu/python/bin/python         437MiB |
|    0     47452      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     47456      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     47460      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     89938      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     89941      C   /opt/h2oai/h2o4gpu/python/bin/python         435MiB |
|    0     89944      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     89948      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    0     89952      C   /opt/h2oai/h2o4gpu/python/bin/python         471MiB |
|    1     11063      C   /opt/h2oai/h2o4gpu/python/bin/python         365MiB |
|    1     11187      C   /opt/h2oai/h2o4gpu/python/bin/python         359MiB |
|    1     11251      C   /opt/h2oai/h2o4gpu/python/bin/python         359MiB |
|    1     11385      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     11549      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     47446      C   /opt/h2oai/h2o4gpu/python/bin/python         405MiB |
|    1     47449      C   /opt/h2oai/h2o4gpu/python/bin/python        4123MiB |
|    1     47452      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     47456      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     47460      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     89938      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     89941      C   /opt/h2oai/h2o4gpu/python/bin/python        4123MiB |
|    1     89944      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     89948      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
|    1     89952      C   /opt/h2oai/h2o4gpu/python/bin/python         401MiB |
+-----------------------------------------------------------------------------+
[jenkins@mr-0xp3 ~]$

sh1ng commented 5 years ago

ps --ppid 65
   PID TTY          TIME CMD
    69 ?        00:32:23 python
    72 ?        00:13:53 python
    75 ?        06:50:17 python
    79 ?        10:52:01 python
    83 ?        00:31:59 python

pid 69

#0  0x00007fffa5d24588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fffa5e1a5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff79774964 in ncclCpuBarrierOut (comm=0x7ffb64000e50)
    at enqueue.cu:143
#3  ncclBarrierEnqueueWait (comm=0x7ffb64000e50) at enqueue.cu:193
#4  0x00007fff79775054 in ncclEnqueueCheck (info=0x7fffccc2fa30) at enqueue.cu:438
#5  0x00007fff7978fa50 in ncclAllReduce (sendbuff=0x7ffef602a300, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cu:17
#6  0x00007fff796b0248 in dh::AllReducer::AllReduceSum (this=0x152a84d80, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffef602a300, 
    recvbuff=0x7ffef602a300, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff796ce734 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x152b45970, p_tree=0x153b0ed40, 
    gpair_all=<optimized out>, reducer=0x152a84d80, num_columns=500)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff796cef5c in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x152b45970, gpair_all=0x1386115b0, p_fmat=
    0x1386158a0, p_tree=0x153b0ed40, reducer=0x152a84d80)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
#9  0x00007fff796add20 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7fffccc302a0)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*---Type <return> to continue, or q <return> to quit---
, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() () at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff796d060c in operator() (__closure=0x7fffccc302d8)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (func=..., 
    this=0x7fffccc302f0)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (f=..., 
    shards=0x152a84d48)
---Type <return> to continue, or q <return> to quit---
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x152a84c10, gpair=0x1386115b0, p_fmat=0x1386158a0, 
    p_tree=0x152a85150) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff796d0cf4 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x152a84c10, gpair=0x1386115b0, 
    dmat=0x1386158a0, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff796d0ed8 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff79528b98 in xgboost::gbm::GBTree::BoostNewTrees (this=0x14ef0c960, 
    gpair=0x1386115b0, p_fmat=0x1386158a0, bst_group=<optimized out>, 
    ret=<optimized out>) at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff79529d78 in xgboost::gbm::GBTree::DoBoost (this=0x14ef0c960, 
    p_fmat=0x1386158a0, in_gpair=0x1386115b0, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff7953ba18 in xgboost::LearnerImpl::UpdateOneIter (this=0x138611430, 
    iter=<optimized out>, train=0x1386158a0)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff794b5a60 in XGBoosterUpdateOneIter (handle=0x1374af760, 
    iter=<optimized out>, dtrain=0x138516770)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fffa482928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fffa4826df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fffa4864150 in _ctypes_callproc ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
---Type <return> to continue, or q <return> to quit---
#24 0x00007fffa4864e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x00000001054918ac in _PyObject_FastCallDict ()
#26 0x000000010556ffe0 in _PyObject_FastCallKeywords ()
#27 0x0000000105579ee0 in call_function ()
#28 0x00000001055b6bb4 in _PyEval_EvalFrameDefault ()
#29 0x000000010546dc34 in PyEval_EvalFrameEx ()
#30 0x000000010556dcf8 in _PyEval_EvalCodeWithName ()
#31 0x000000010556f9b4 in fast_function ()
Backtrace stopped: frame did not save the PC

pid 72 - nothing just wait finishing others pid 75

#0  0x00007fff8deb4588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fff8dfaa5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff618e4964 in ncclCpuBarrierOut (comm=0x7ffc10000dc0)
    at enqueue.cu:143
#3  ncclBarrierEnqueueWait (comm=0x7ffc10000dc0) at enqueue.cu:193
#4  0x00007fff618e5054 in ncclEnqueueCheck (info=0x7fffd31c1890) at enqueue.cu:438
#5  0x00007fff618ffa50 in ncclAllReduce (sendbuff=0x7ffe9320aa00, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cu:17
#6  0x00007fff61820248 in dh::AllReducer::AllReduceSum (this=0x158756910, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffe9320aa00, 
    recvbuff=0x7ffe9320aa00, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff6183e734 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x1658b6210, p_tree=0x16481dda0, 
    gpair_all=<optimized out>, reducer=0x158756910, num_columns=10)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff6183ef5c in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x1658b6210, gpair_all=0x16093ab90, p_fmat=
    0x160c7e060, p_tree=0x16481dda0, reducer=0x158756910)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
#9  0x00007fff6181dd20 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7fffd31c2100)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*---Type <return> to continue, or q <return> to quit---
, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() () at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff6184060c in operator() (__closure=0x7fffd31c2138)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (func=..., 
    this=0x7fffd31c2150)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (f=..., 
    shards=0x1587568d8)
---Type <return> to continue, or q <return> to quit---
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x1587567a0, gpair=0x16093ab90, p_fmat=0x160c7e060, 
    p_tree=0x1587696b0) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff61840cf4 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x1587567a0, gpair=0x16093ab90, 
    dmat=0x160c7e060, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff61840ed8 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff61698b98 in xgboost::gbm::GBTree::BoostNewTrees (this=0x164b5c5e0, 
    gpair=0x16093ab90, p_fmat=0x160c7e060, bst_group=<optimized out>, 
    ret=<optimized out>) at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff61699d78 in xgboost::gbm::GBTree::DoBoost (this=0x164b5c5e0, 
    p_fmat=0x160c7e060, in_gpair=0x16093ab90, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff616aba18 in xgboost::LearnerImpl::UpdateOneIter (this=0x16093aa10, 
    iter=<optimized out>, train=0x160c7e060)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff61625a60 in XGBoosterUpdateOneIter (handle=0x163425220, 
    iter=<optimized out>, dtrain=0x15e9cc500)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fff8c9b928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fff8c9b6df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fff8c9f4150 in _ctypes_callproc ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
---Type <return> to continue, or q <return> to quit---
#24 0x00007fff8c9f4e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x000000010db918ac in _PyObject_FastCallDict ()
#26 0x000000010dc6ffe0 in _PyObject_FastCallKeywords ()
#27 0x000000010dc79ee0 in call_function ()
#28 0x000000010dcb6bb4 in _PyEval_EvalFrameDefault ()
#29 0x000000010db6dc34 in PyEval_EvalFrameEx ()
#30 0x000000010dc6dcf8 in _PyEval_EvalCodeWithName ()
#31 0x000000010dc6f9b4 in fast_function ()

pid 79

#0  0x00007fff83034588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fff8312a5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff56a84964 in ncclCpuBarrierOut (comm=0x7ffc7c000e50)
    at enqueue.cu:143
#3  ncclBarrierEnqueueWait (comm=0x7ffc7c000e50) at enqueue.cu:193
#4  0x00007fff56a85054 in ncclEnqueueCheck (info=0x7ffff1828a50) at enqueue.cu:438
#5  0x00007fff56a9fa50 in ncclAllReduce (sendbuff=0x7ffed700aa00, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cu:17
#6  0x00007fff569c0248 in dh::AllReducer::AllReduceSum (this=0x15a5171d0, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffed700aa00, 
    recvbuff=0x7ffed700aa00, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff569de734 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x15a51a010, p_tree=0x15a5199f0, 
    gpair_all=<optimized out>, reducer=0x15a5171d0, num_columns=10)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff569def5c in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x15a51a010, gpair_all=0x153338be0, p_fmat=
    0x153302950, p_tree=0x15a5199f0, reducer=0x15a5171d0)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
#9  0x00007fff569bdd20 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7ffff18292c0)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*---Type <return> to continue, or q <return> to quit---
, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() () at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff569e060c in operator() (__closure=0x7ffff18292f8)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (func=..., 
    this=0x7ffff1829310)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (f=..., 
    shards=0x15a517198)
---Type <return> to continue, or q <return> to quit---
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x15a517060, gpair=0x153338be0, p_fmat=0x153302950, 
    p_tree=0x15a517380) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff569e0cf4 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x15a517060, gpair=0x153338be0, 
    dmat=0x153302950, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff569e0ed8 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff56838b98 in xgboost::gbm::GBTree::BoostNewTrees (this=0x15ca2f050, 
    gpair=0x153338be0, p_fmat=0x153302950, bst_group=<optimized out>, 
    ret=<optimized out>) at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff56839d78 in xgboost::gbm::GBTree::DoBoost (this=0x15ca2f050, 
    p_fmat=0x153302950, in_gpair=0x153338be0, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff5684ba18 in xgboost::LearnerImpl::UpdateOneIter (this=0x153338a60, 
    iter=<optimized out>, train=0x153302950)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff567c5a60 in XGBoosterUpdateOneIter (handle=0x15d9f8080, 
    iter=<optimized out>, dtrain=0x144988490)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fff81b3928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fff81b36df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fff81b74150 in _ctypes_callproc ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
---Type <return> to continue, or q <return> to quit---
#24 0x00007fff81b74e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x000000012e3e18ac in _PyObject_FastCallDict ()
#26 0x000000012e4bffe0 in _PyObject_FastCallKeywords ()
#27 0x000000012e4c9ee0 in call_function ()
#28 0x000000012e506bb4 in _PyEval_EvalFrameDefault ()
#29 0x000000012e3bdc34 in PyEval_EvalFrameEx ()
#30 0x000000012e4bdcf8 in _PyEval_EvalCodeWithName ()
#31 0x000000012e4bf9b4 in fast_function ()

pid 83

#0  0x00007fff9d1c4588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fff9d2ba5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff70c14964 in ncclCpuBarrierOut (comm=0x7fff20000e60)
    at enqueue.cu:143
#3  ncclBarrierEnqueueWait (comm=0x7fff20000e60) at enqueue.cu:193
#4  0x00007fff70c15054 in ncclEnqueueCheck (info=0x7fffd139f2a0) at enqueue.cu:438
#5  0x00007fff70c2fa50 in ncclAllReduce (sendbuff=0x7ffef74a2700, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cu:17
#6  0x00007fff70b50248 in dh::AllReducer::AllReduceSum (this=0x16455ef80, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffef74a2700, 
    recvbuff=0x7ffef74a2700, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff70b6e734 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x164560ea0, p_tree=0x164f2cc60, 
    gpair_all=<optimized out>, reducer=0x16455ef80, num_columns=24)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff70b6ef5c in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x164560ea0, gpair_all=0x15fa8c0b0, p_fmat=
    0x15f387d30, p_tree=0x164f2cc60, reducer=0x16455ef80)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
#9  0x00007fff70b4dd20 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7fffd139fb10)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*---Type <return> to continue, or q <return> to quit---
, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() () at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff70b7060c in operator() (__closure=0x7fffd139fb48)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (func=..., 
    this=0x7fffd139fb60)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (f=..., 
    shards=0x16455ef48)
---Type <return> to continue, or q <return> to quit---
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x16455ee10, gpair=0x15fa8c0b0, p_fmat=0x15f387d30, 
    p_tree=0x16455f2e0) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff70b70cf4 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x16455ee10, gpair=0x15fa8c0b0, 
    dmat=0x15f387d30, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff70b70ed8 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff709c8b98 in xgboost::gbm::GBTree::BoostNewTrees (this=0x161671310, 
    gpair=0x15fa8c0b0, p_fmat=0x15f387d30, bst_group=<optimized out>, 
    ret=<optimized out>) at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff709c9d78 in xgboost::gbm::GBTree::DoBoost (this=0x161671310, 
    p_fmat=0x15f387d30, in_gpair=0x15fa8c0b0, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff709dba18 in xgboost::LearnerImpl::UpdateOneIter (this=0x15fa8bf30, 
    iter=<optimized out>, train=0x15f387d30)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff70955a60 in XGBoosterUpdateOneIter (handle=0x1613fbb90, 
    iter=<optimized out>, dtrain=0x152739fd0)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fff97cb928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fff97cb6df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fff97cf4150 in _ctypes_callproc ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
---Type <return> to continue, or q <return> to quit---
#24 0x00007fff97cf4e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x0000000139c418ac in _PyObject_FastCallDict ()
#26 0x0000000139d1ffe0 in _PyObject_FastCallKeywords ()
#27 0x0000000139d29ee0 in call_function ()
#28 0x0000000139d66bb4 in _PyEval_EvalFrameDefault ()
#29 0x0000000139c1dc34 in PyEval_EvalFrameEx ()
#30 0x0000000139d1dcf8 in _PyEval_EvalCodeWithName ()
#31 0x0000000139d1f9b4 in fast_function ()
Backtrace stopped: frame did not save the PC

sh1ng commented 5 years ago

Single process also hangs

#0  0x00007fffb2904588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fffb29fa5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff86334964 in ncclCpuBarrierOut (comm=0x7ffcc8000e60)
    at enqueue.cu:143
#3  ncclBarrierEnqueueWait (comm=0x7ffcc8000e60) at enqueue.cu:193
#4  0x00007fff86335054 in ncclEnqueueCheck (info=0x7fffef9248c0)
    at enqueue.cu:438
#5  0x00007fff8634fa50 in ncclAllReduce (sendbuff=0x7ffcb322a300, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cu:17
#6  0x00007fff86270248 in dh::AllReducer::AllReduceSum (this=0x16b3f41e0, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffcb322a300, 
    recvbuff=0x7ffcb322a300, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff8628e734 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x152aca4c0, p_tree=0x16b442260, 
    gpair_all=<optimized out>, reducer=0x16b3f41e0, num_columns=500)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff8628ef5c in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x152aca4c0, gpair_all=0x1608209a0, 
    p_fmat=0x1522e2100, p_tree=0x16b442260, reducer=0x16b3f41e0)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
---Type <return> to continue, or q <return> to quit---
#9  0x00007fff8626dd20 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7fffef925130)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() ()
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff8629060c in operator() (__closure=0x7fffef925168)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
---Type <return> to continue, or q <return> to quit---
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (
    func=..., this=0x7fffef925180)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (
    f=..., shards=0x16b3f41a8)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x16b3f4070, gpair=0x1608209a0, 
    p_fmat=0x1522e2100, p_tree=0x162e48d50)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff86290cf4 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x16b3f4070, gpair=0x1608209a0, 
---Type <return> to continue, or q <return> to quit---
    dmat=0x1522e2100, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff86290ed8 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, 
    trees=...) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff860e8b98 in xgboost::gbm::GBTree::BoostNewTrees (
    this=0x16b3fb480, gpair=0x1608209a0, p_fmat=0x1522e2100, 
    bst_group=<optimized out>, ret=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff860e9d78 in xgboost::gbm::GBTree::DoBoost (this=0x16b3fb480, 
    p_fmat=0x1522e2100, in_gpair=0x1608209a0, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff860fba18 in xgboost::LearnerImpl::UpdateOneIter (
    this=0x160820820, iter=<optimized out>, train=0x1522e2100)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff86075a60 in XGBoosterUpdateOneIter (handle=0x152ac73f0, 
    iter=<optimized out>, dtrain=0x15130ce80)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fffb140928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fffb1406df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fffb1444150 in _ctypes_callproc ()
---Type <return> to continue, or q <return> to quit---
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#24 0x00007fffb1444e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x00000001247e18ac in _PyObject_FastCallDict ()
#26 0x00000001248bffe0 in _PyObject_FastCallKeywords ()
#27 0x00000001248c9ee0 in call_function ()
#28 0x0000000124906bb4 in _PyEval_EvalFrameDefault ()
#29 0x00000001247bdc34 in PyEval_EvalFrameEx ()
#30 0x00000001248bdcf8 in _PyEval_EvalCodeWithName ()
#31 0x00000001248bf9b4 in fast_function ()
Backtrace stopped: frame did not save the PC

sh1ng commented 5 years ago

NCCL 2.4.7

#0  0x00007fff8ae44588 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fff8af3a5a8 in pthread_yield () from /usr/lib64/libpthread.so.0
#2  0x00007fff6674e264 in ncclCpuBarrierOut (comm=0x7ffeb8000dc0)
    at enqueue.cc:143
#3  ncclBarrierEnqueueWait (comm=0x7ffeb8000dc0) at enqueue.cc:193
#4  0x00007fff6674e954 in ncclEnqueueCheck (info=0x7fffd6e13ba0)
    at enqueue.cc:438
#5  0x00007fff66768d70 in ncclAllReduce (sendbuff=0x7ffef229a700, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=<optimized out>, 
    op=<optimized out>, comm=<optimized out>, stream=<optimized out>)
    at collectives/all_reduce.cc:17
#6  0x00007fff6668c348 in dh::AllReducer::AllReduceSum (this=0x16331cf80, 
    communication_group_idx=<optimized out>, sendbuff=0x7ffef229a700, 
    recvbuff=0x7ffef229a700, count=<optimized out>)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#7  0x00007fff666a8db4 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=0x16331e370, p_tree=0x16393a980, 
    gpair_all=<optimized out>, reducer=0x16331cf80, num_columns=24)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#8  0x00007fff666a95e4 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x16331e370, gpair_all=0x1666bd800, 
    p_fmat=0x160781490, p_tree=0x16393a980, reducer=0x16331cf80)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
---Type <return> to continue, or q <return> to quit---
#9  0x00007fff66689fe0 in operator() (shard=..., idx=<optimized out>, 
    __closure=0x7fffd6e14440)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#10 void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() ()
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
#11 0x00007fff666aaa8c in operator() (__closure=0x7fffd6e14478)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
---Type <return> to continue, or q <return> to quit---
#12 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (
    func=..., this=0x7fffd6e14490)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#13 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (
    f=..., shards=0x16331cf48)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#14 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x16331ce10, gpair=0x1666bd800, 
    p_fmat=0x160781490, p_tree=0x16331d130)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#15 0x00007fff666ab174 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x16331ce10, gpair=0x1666bd800, 
---Type <return> to continue, or q <return> to quit---
    dmat=0x160781490, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#16 0x00007fff666ab358 in xgboost::tree::GPUHistMaker::Update (
    this=<optimized out>, gpair=<optimized out>, dmat=<optimized out>, 
    trees=...) at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1599
#17 0x00007fff6650df38 in xgboost::gbm::GBTree::BoostNewTrees (
    this=0x1666a6990, gpair=0x1666bd800, p_fmat=0x160781490, 
    bst_group=<optimized out>, ret=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:293
#18 0x00007fff6650f118 in xgboost::gbm::GBTree::DoBoost (this=0x1666a6990, 
    p_fmat=0x160781490, in_gpair=0x1666bd800, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#19 0x00007fff66520db8 in xgboost::LearnerImpl::UpdateOneIter (
    this=0x1666bd680, iter=<optimized out>, train=0x160781490)
    at /root/repo/xgboost/src/learner.cc:474
#20 0x00007fff6649adf0 in XGBoosterUpdateOneIter (handle=0x166a42b00, 
    iter=<optimized out>, dtrain=0x160785d00)
    at /root/repo/xgboost/src/c_api/c_api.cc:896
#21 0x00007fff8994928c in ffi_call_LINUX64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#22 0x00007fff89946df4 in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#23 0x00007fff89984150 in _ctypes_callproc ()
---Type <return> to continue, or q <return> to quit---
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#24 0x00007fff89984e80 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-powerpc64le-linux-gnu.so
#25 0x00000001175c18ac in _PyObject_FastCallDict ()
#26 0x000000011769ffe0 in _PyObject_FastCallKeywords ()
#27 0x00000001176a9ee0 in call_function ()
#28 0x00000001176e6bb4 in _PyEval_EvalFrameDefault ()
#29 0x000000011759dc34 in PyEval_EvalFrameEx ()
#30 0x000000011769dcf8 in _PyEval_EvalCodeWithName ()
#31 0x000000011769f9b4 in fast_function ()

Plus there're a few more errors http://mr-0xc1:8080/job/h2o4gpu-ppc64le-cuda9/job/PR-775/2/console

15:27:38  =================================== FAILURES ===================================
15:27:38  __________________ TestGPUPredict.test_predict_sklearn_pickle __________________
15:27:38  [gw0] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <unittest.case._Outcome object at 0x7fff709730f0>
15:27:38  test_case = <test_gpu_prediction_pickledmodel.TestGPUPredict testMethod=test_predict_sklearn_pickle>
15:27:38  isTest = True
15:27:38  
15:27:38      @contextlib.contextmanager
15:27:38      def testPartExecutor(self, test_case, isTest=False):
15:27:38          old_success = self.success
15:27:38          self.success = True
15:27:38          try:
15:27:38  >           yield
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/unittest/case.py:59: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <test_gpu_prediction_pickledmodel.TestGPUPredict testMethod=test_predict_sklearn_pickle>
15:27:38  result = <TestCaseFunction 'test_predict_sklearn_pickle'>
15:27:38  
15:27:38      def run(self, result=None):
15:27:38          orig_result = result
15:27:38          if result is None:
15:27:38              result = self.defaultTestResult()
15:27:38              startTestRun = getattr(result, 'startTestRun', None)
15:27:38              if startTestRun is not None:
15:27:38                  startTestRun()
15:27:38      
15:27:38          result.startTest(self)
15:27:38      
15:27:38          testMethod = getattr(self, self._testMethodName)
15:27:38          if (getattr(self.__class__, "__unittest_skip__", False) or
15:27:38              getattr(testMethod, "__unittest_skip__", False)):
15:27:38              # If the class or method was skipped.
15:27:38              try:
15:27:38                  skip_why = (getattr(self.__class__, '__unittest_skip_why__', '')
15:27:38                              or getattr(testMethod, '__unittest_skip_why__', ''))
15:27:38                  self._addSkip(result, self, skip_why)
15:27:38              finally:
15:27:38                  result.stopTest(self)
15:27:38              return
15:27:38          expecting_failure_method = getattr(testMethod,
15:27:38                                             "__unittest_expecting_failure__", False)
15:27:38          expecting_failure_class = getattr(self,
15:27:38                                            "__unittest_expecting_failure__", False)
15:27:38          expecting_failure = expecting_failure_class or expecting_failure_method
15:27:38          outcome = _Outcome(result)
15:27:38          try:
15:27:38              self._outcome = outcome
15:27:38      
15:27:38              with outcome.testPartExecutor(self):
15:27:38                  self.setUp()
15:27:38              if outcome.success:
15:27:38                  outcome.expecting_failure = expecting_failure
15:27:38                  with outcome.testPartExecutor(self, isTest=True):
15:27:38  >                   testMethod()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/unittest/case.py:605: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <test_gpu_prediction_pickledmodel.TestGPUPredict testMethod=test_predict_sklearn_pickle>
15:27:38  
15:27:38      def test_predict_sklearn_pickle(self):
15:27:38          X, y = makeXy()
15:27:38          Xtest = makeXtest()
15:27:38      
15:27:38          from xgboost import XGBClassifier
15:27:38          kwargs = {}
15:27:38          kwargs['tree_method'] = 'gpu_hist'
15:27:38          kwargs['predictor'] = 'gpu_predictor'
15:27:38          kwargs['silent'] = 0
15:27:38          kwargs['objective'] = 'binary:logistic'
15:27:38          # TODO: workaround, remove it when xgboost is fixes
15:27:38          kwargs['n_gpus'] = -1
15:27:38      
15:27:38          model = XGBClassifier(**kwargs)
15:27:38  >       model.fit(X, y)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_gpu_prediction_pickledmodel.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
15:27:38         colsample_bynode=1, colsample_bytree=1, ga... reg_lambda=1, scale_pos_weight=1,
15:27:38         seed=None, silent=0, subsample=1, tree_method='gpu_hist',
15:27:38         verbosity=1)
15:27:38  X = array([[ 1.6243453636632417  , -0.6117564136500754  ,
15:27:38          -0.5281717522634558  , ..., -2.2370865111124707  ,
15:27:38       ...6  ,
15:27:38          -0.01719033793579945 , ...,  0.23915453351027363 ,
15:27:38           1.3255634960114144  ,  1.3462142811747388  ]])
15:27:38  y = [0, 1, 0, 1, 0, 1, ...], sample_weight = None, eval_set = None
15:27:38  eval_metric = None, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, verbose = True
15:27:38  xgb_model = None, sample_weight_eval_set = None, callbacks = None
15:27:38  
15:27:38          def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
15:27:38                  early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None,
15:27:38                                        verbose=True, xgb_model=None,
15:27:38                  sample_weight_eval_set=None, callbacks=None):
15:27:38              # pylint: disable = attribute-defined-outside-init,arguments-differ
15:27:38              """
15:27:38              Fit gradient boosting classifier
15:27:38      
15:27:38              Parameters
15:27:38              ----------
15:27:38              X : array_like
15:27:38                  Feature matrix
15:27:38              y : array_like
15:27:38                  Labels
15:27:38              sample_weight : array_like
15:27:38                  Weight for each instance
15:27:38              eval_set : list, optional
15:27:38                  A list of (X, y) pairs to use as a validation set for
15:27:38                  early-stopping
15:27:38              sample_weight_eval_set : list, optional
15:27:38                  A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
15:27:38                  instance weights on the i-th validation set.
15:27:38              eval_metric : str, callable, optional
15:27:38                  If a str, should be a built-in evaluation metric to use. See
15:27:38                  doc/parameter.rst. If callable, a custom evaluation metric. The call
15:27:38                  signature is func(y_predicted, y_true) where y_true will be a
15:27:38                  DMatrix object such that you may need to call the get_label
15:27:38                  method. It must return a str, value pair where the str is a name
15:27:38                  for the evaluation and value is the value of the evaluation
15:27:38                  function. This objective is always minimized.
15:27:38              early_stopping_rounds : int, optional
15:27:38                  Activates early stopping. Validation error needs to decrease at
15:27:38                  least every <early_stopping_rounds> round(s) to continue training.
15:27:38                  Requires at least one item in evals. If there's more than one,
15:27:38                  will use the last. If early stopping occurs, the model will have
15:27:38                  three additional fields: bst.best_score, bst.best_iteration and
15:27:38                  bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
15:27:38                  default value in predict method if not any other value is specified).
15:27:38                  (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
15:27:38                  and/or num_class appears in the parameters)
15:27:38              early_stopping_threshold : float
15:27:38                Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38                more than threshold times the score from early_stopping_rounds before,
15:27:38                  then the learning stops.
15:27:38              early_stopping_limit: float
15:27:38                  Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38                  to value of limit.
15:27:38              verbose : bool
15:27:38                  If `verbose` and an evaluation set is used, writes the evaluation
15:27:38                  metric measured on the validation set to stderr.
15:27:38              xgb_model : str
15:27:38                  file name of stored xgb model or 'Booster' instance Xgb model to be
15:27:38                  loaded before training (allows training continuation).
15:27:38              callbacks : list of callback functions
15:27:38                  List of callback functions that are applied at end of each iteration.
15:27:38                  It is possible to use predefined callbacks by using :ref:`callback_api`.
15:27:38                  Example:
15:27:38      
15:27:38                  .. code-block:: python
15:27:38      
15:27:38                      [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38              """
15:27:38              evals_result = {}
15:27:38              self.classes_ = np.unique(y)
15:27:38              self.n_classes_ = len(self.classes_)
15:27:38      
15:27:38              xgb_options = self.get_xgb_params()
15:27:38      
15:27:38              if callable(self.objective):
15:27:38                  obj = _objective_decorator(self.objective)
15:27:38                  # Use default value. Is it really not used ?
15:27:38                  xgb_options["objective"] = "binary:logistic"
15:27:38              else:
15:27:38                  obj = None
15:27:38      
15:27:38              if self.n_classes_ > 2:
15:27:38                  # Switch to using a multiclass objective in the underlying XGB instance
15:27:38                  xgb_options["objective"] = "multi:softprob"
15:27:38                  xgb_options['num_class'] = self.n_classes_
15:27:38      
15:27:38              feval = eval_metric if callable(eval_metric) else None
15:27:38              if eval_metric is not None:
15:27:38                  if callable(eval_metric):
15:27:38                      eval_metric = None
15:27:38                  else:
15:27:38                      xgb_options.update({"eval_metric": eval_metric})
15:27:38      
15:27:38              self._le = XGBLabelEncoder().fit(y)
15:27:38              training_labels = self._le.transform(y)
15:27:38      
15:27:38              if eval_set is not None:
15:27:38                  if sample_weight_eval_set is None:
15:27:38                      sample_weight_eval_set = [None] * len(eval_set)
15:27:38                  evals = list(
15:27:38                      DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
15:27:38                              missing=self.missing, weight=sample_weight_eval_set[i],
15:27:38                              nthread=self.n_jobs)
15:27:38                      for i in range(len(eval_set))
15:27:38                  )
15:27:38                  nevals = len(evals)
15:27:38                  eval_names = ["validation_{}".format(i) for i in range(nevals)]
15:27:38                  evals = list(zip(evals, eval_names))
15:27:38              else:
15:27:38                  evals = ()
15:27:38      
15:27:38              self._features_count = X.shape[1]
15:27:38      
15:27:38              if sample_weight is not None:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38              else:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38      
15:27:38              self._Booster = train(xgb_options, train_dmatrix, self.get_num_boosting_rounds(),
15:27:38                                    evals=evals, early_stopping_rounds=early_stopping_rounds,
15:27:38                                    early_stopping_threshold=early_stopping_threshold,
15:27:38                                    early_stopping_limit=early_stopping_limit,
15:27:38                                    evals_result=evals_result, obj=obj, feval=feval,
15:27:38                                    verbose_eval=verbose, xgb_model=xgb_model,
15:27:38  >                                 callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:757: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff708810b8>, num_boost_round = 100
15:27:38  evals = (), obj = None, feval = None, maximize = False
15:27:38  early_stopping_rounds = None, early_stopping_threshold = None
15:27:38  early_stopping_limit = None, evals_result = {}, verbose_eval = True
15:27:38  xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff708970d0>, <function record_evaluation.<locals>.callback at 0x7fff708971e0>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff708810b8>, num_boost_round = 100
15:27:38  evals = [], obj = None, feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff708970d0>, <function record_evaluation.<locals>.callback at 0x7fff708971e0>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff70881080>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff708810b8>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [12:54:50] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff7c9cf984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff7cbeb2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff7cbeb358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff7ca4df38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xa78) [0x7fff7ca4f118]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff7ca60db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff7c9dadf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fff9be7928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  _______________________ test_sklearn_gbm_classification ________________________
15:27:38  [gw0] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <CallInfo when='call' exception: [12:54:51] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_his...8c]
15:27:38    [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  >
15:27:38  func = <function call_runtest_hook.<locals>.<lambda> at 0x7fff664471e0>
15:27:38  when = 'call', treat_keyboard_interrupt_as_exception = False
15:27:38  
15:27:38      def __init__(self, func, when, treat_keyboard_interrupt_as_exception=False):
15:27:38          #: context of invocation: one of "setup", "call",
15:27:38          #: "teardown", "memocollect"
15:27:38          self.when = when
15:27:38          self.start = time()
15:27:38          try:
15:27:38  >           self.result = func()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >       lambda: ihook(item=item, **kwds),
15:27:38          when=when,
15:27:38          treat_keyboard_interrupt_as_exception=item.config.getvalue("usepdb"),
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:194: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_runtest_call'>, args = ()
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_classification'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_classification'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_classification'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  item = <Function 'test_sklearn_gbm_classification'>
15:27:38  
15:27:38      def pytest_runtest_call(item):
15:27:38          _update_current_test_var(item, "call")
15:27:38          sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
15:27:38          try:
15:27:38  >           item.runtest()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:122: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <Function 'test_sklearn_gbm_classification'>
15:27:38  
15:27:38      def runtest(self):
15:27:38          """ execute the underlying test function. """
15:27:38  >       self.ihook.pytest_pyfunc_call(pyfuncitem=self)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:1438: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_classification'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_classification'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_classification'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  pyfuncitem = <Function 'test_sklearn_gbm_classification'>
15:27:38  
15:27:38      @hookimpl(trylast=True)
15:27:38      def pytest_pyfunc_call(pyfuncitem):
15:27:38          testfunction = pyfuncitem.obj
15:27:38          if pyfuncitem._isyieldedfunction():
15:27:38              testfunction(*pyfuncitem._args)
15:27:38          else:
15:27:38              funcargs = pyfuncitem.funcargs
15:27:38              testargs = {}
15:27:38              for arg in pyfuncitem._fixtureinfo.argnames:
15:27:38                  testargs[arg] = funcargs[arg]
15:27:38  >           testfunction(**testargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:166: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >   def test_sklearn_gbm_classification(): test_gbm_classifier_backupsklearn()
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:237: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  backend = 'auto'
15:27:38  
15:27:38      def test_gbm_classifier_backupsklearn(backend='auto'):
15:27:38          df = pd.read_csv("./open_data/creditcard.csv")
15:27:38          X = np.array(df.iloc[:, :df.shape[1] - 1], dtype='float32', order='C')
15:27:38          y = np.array(df.iloc[:, df.shape[1] - 1], dtype='float32', order='C')
15:27:38          import h2o4gpu
15:27:38          Solver = h2o4gpu.GradientBoostingClassifier
15:27:38      
15:27:38          # Run h2o4gpu version of RandomForest Regression
15:27:38          gbm = Solver(backend=backend, random_state=1234)
15:27:38          print("h2o4gpu fit()")
15:27:38  >       gbm.fit(X, y)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:186: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <h2o4gpu.solvers.xgboost.GradientBoostingClassifier object at 0x7fff97c5e278>
15:27:38  X = array([[1.0000e+00, 2.0000e+04, 2.0000e+00, ..., 0.0000e+00, 0.0000e+00,
15:27:38          0.0000e+00],
15:27:38         [2.0000e+00, 1.20...000e+03],
15:27:38         [2.3999e+04, 2.0000e+04, 1.0000e+00, ..., 1.0000e+03, 0.0000e+00,
15:27:38          0.0000e+00]], dtype=float32)
15:27:38  y = array([1., 1., 0., ..., 0., 0., 0.], dtype=float32), sample_weight = None
15:27:38  
15:27:38      def fit(self, X, y=None, sample_weight=None):
15:27:38  >       res = self.model.fit(X, y, sample_weight)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/h2o4gpu/solvers/xgboost.py:1081: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
15:27:38         colsample_bynode=1, colsample_bytree=1.0, ...lambda=1, scale_pos_weight=1,
15:27:38         seed=None, silent=True, subsample=1.0, tree_method='gpu_hist',
15:27:38         verbosity=1)
15:27:38  X = array([[1.0000e+00, 2.0000e+04, 2.0000e+00, ..., 0.0000e+00, 0.0000e+00,
15:27:38          0.0000e+00],
15:27:38         [2.0000e+00, 1.20...000e+03],
15:27:38         [2.3999e+04, 2.0000e+04, 1.0000e+00, ..., 1.0000e+03, 0.0000e+00,
15:27:38          0.0000e+00]], dtype=float32)
15:27:38  y = array([1., 1., 0., ..., 0., 0., 0.], dtype=float32), sample_weight = None
15:27:38  eval_set = None, eval_metric = None, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, verbose = True
15:27:38  xgb_model = None, sample_weight_eval_set = None, callbacks = None
15:27:38  
15:27:38          def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
15:27:38                  early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None,
15:27:38                                        verbose=True, xgb_model=None,
15:27:38                  sample_weight_eval_set=None, callbacks=None):
15:27:38              # pylint: disable = attribute-defined-outside-init,arguments-differ
15:27:38              """
15:27:38              Fit gradient boosting classifier
15:27:38      
15:27:38              Parameters
15:27:38              ----------
15:27:38              X : array_like
15:27:38                  Feature matrix
15:27:38              y : array_like
15:27:38                  Labels
15:27:38              sample_weight : array_like
15:27:38                  Weight for each instance
15:27:38              eval_set : list, optional
15:27:38                  A list of (X, y) pairs to use as a validation set for
15:27:38                  early-stopping
15:27:38              sample_weight_eval_set : list, optional
15:27:38                  A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
15:27:38                  instance weights on the i-th validation set.
15:27:38              eval_metric : str, callable, optional
15:27:38                  If a str, should be a built-in evaluation metric to use. See
15:27:38                  doc/parameter.rst. If callable, a custom evaluation metric. The call
15:27:38                  signature is func(y_predicted, y_true) where y_true will be a
15:27:38                  DMatrix object such that you may need to call the get_label
15:27:38                  method. It must return a str, value pair where the str is a name
15:27:38                  for the evaluation and value is the value of the evaluation
15:27:38                  function. This objective is always minimized.
15:27:38              early_stopping_rounds : int, optional
15:27:38                  Activates early stopping. Validation error needs to decrease at
15:27:38                  least every <early_stopping_rounds> round(s) to continue training.
15:27:38                  Requires at least one item in evals. If there's more than one,
15:27:38                  will use the last. If early stopping occurs, the model will have
15:27:38                  three additional fields: bst.best_score, bst.best_iteration and
15:27:38                  bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
15:27:38                  default value in predict method if not any other value is specified).
15:27:38                  (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
15:27:38                  and/or num_class appears in the parameters)
15:27:38              early_stopping_threshold : float
15:27:38                Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38                more than threshold times the score from early_stopping_rounds before,
15:27:38                  then the learning stops.
15:27:38              early_stopping_limit: float
15:27:38                  Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38                  to value of limit.
15:27:38              verbose : bool
15:27:38                  If `verbose` and an evaluation set is used, writes the evaluation
15:27:38                  metric measured on the validation set to stderr.
15:27:38              xgb_model : str
15:27:38                  file name of stored xgb model or 'Booster' instance Xgb model to be
15:27:38                  loaded before training (allows training continuation).
15:27:38              callbacks : list of callback functions
15:27:38                  List of callback functions that are applied at end of each iteration.
15:27:38                  It is possible to use predefined callbacks by using :ref:`callback_api`.
15:27:38                  Example:
15:27:38      
15:27:38                  .. code-block:: python
15:27:38      
15:27:38                      [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38              """
15:27:38              evals_result = {}
15:27:38              self.classes_ = np.unique(y)
15:27:38              self.n_classes_ = len(self.classes_)
15:27:38      
15:27:38              xgb_options = self.get_xgb_params()
15:27:38      
15:27:38              if callable(self.objective):
15:27:38                  obj = _objective_decorator(self.objective)
15:27:38                  # Use default value. Is it really not used ?
15:27:38                  xgb_options["objective"] = "binary:logistic"
15:27:38              else:
15:27:38                  obj = None
15:27:38      
15:27:38              if self.n_classes_ > 2:
15:27:38                  # Switch to using a multiclass objective in the underlying XGB instance
15:27:38                  xgb_options["objective"] = "multi:softprob"
15:27:38                  xgb_options['num_class'] = self.n_classes_
15:27:38      
15:27:38              feval = eval_metric if callable(eval_metric) else None
15:27:38              if eval_metric is not None:
15:27:38                  if callable(eval_metric):
15:27:38                      eval_metric = None
15:27:38                  else:
15:27:38                      xgb_options.update({"eval_metric": eval_metric})
15:27:38      
15:27:38              self._le = XGBLabelEncoder().fit(y)
15:27:38              training_labels = self._le.transform(y)
15:27:38      
15:27:38              if eval_set is not None:
15:27:38                  if sample_weight_eval_set is None:
15:27:38                      sample_weight_eval_set = [None] * len(eval_set)
15:27:38                  evals = list(
15:27:38                      DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
15:27:38                              missing=self.missing, weight=sample_weight_eval_set[i],
15:27:38                              nthread=self.n_jobs)
15:27:38                      for i in range(len(eval_set))
15:27:38                  )
15:27:38                  nevals = len(evals)
15:27:38                  eval_names = ["validation_{}".format(i) for i in range(nevals)]
15:27:38                  evals = list(zip(evals, eval_names))
15:27:38              else:
15:27:38                  evals = ()
15:27:38      
15:27:38              self._features_count = X.shape[1]
15:27:38      
15:27:38              if sample_weight is not None:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38              else:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38      
15:27:38              self._Booster = train(xgb_options, train_dmatrix, self.get_num_boosting_rounds(),
15:27:38                                    evals=evals, early_stopping_rounds=early_stopping_rounds,
15:27:38                                    early_stopping_threshold=early_stopping_threshold,
15:27:38                                    early_stopping_limit=early_stopping_limit,
15:27:38                                    evals_result=evals_result, obj=obj, feval=feval,
15:27:38                                    verbose_eval=verbose, xgb_model=xgb_model,
15:27:38  >                                 callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:757: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5eaee4e0>, num_boost_round = 100
15:27:38  evals = (), obj = None, feval = None, maximize = False
15:27:38  early_stopping_rounds = None, early_stopping_threshold = None
15:27:38  early_stopping_limit = None, evals_result = {}, verbose_eval = True
15:27:38  xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff5eb1f378>, <function record_evaluation.<locals>.callback at 0x7fff5eb1f488>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5eaee4e0>, num_boost_round = 100
15:27:38  evals = [], obj = None, feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff5eb1f378>, <function record_evaluation.<locals>.callback at 0x7fff5eb1f488>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff5eaee550>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5eaee4e0>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [12:54:51] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff7c9cf984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff7cbeb2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff7cbeb358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff7ca4df38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xa78) [0x7fff7ca4f118]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff7ca60db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff7c9dadf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fff9be7928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  _____________________________ test_xgboost_covtype _____________________________
15:27:38  [gw0] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <CallInfo when='call' exception: [12:57:47] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_his...8c]
15:27:38    [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  >
15:27:38  func = <function call_runtest_hook.<locals>.<lambda> at 0x7fff502b2840>
15:27:38  when = 'call', treat_keyboard_interrupt_as_exception = False
15:27:38  
15:27:38      def __init__(self, func, when, treat_keyboard_interrupt_as_exception=False):
15:27:38          #: context of invocation: one of "setup", "call",
15:27:38          #: "teardown", "memocollect"
15:27:38          self.when = when
15:27:38          self.start = time()
15:27:38          try:
15:27:38  >           self.result = func()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >       lambda: ihook(item=item, **kwds),
15:27:38          when=when,
15:27:38          treat_keyboard_interrupt_as_exception=item.config.getvalue("usepdb"),
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:194: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_runtest_call'>, args = ()
15:27:38  kwargs = {'item': <Function 'test_xgboost_covtype'>}, notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_xgboost_covtype'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_xgboost_covtype'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  item = <Function 'test_xgboost_covtype'>
15:27:38  
15:27:38      def pytest_runtest_call(item):
15:27:38          _update_current_test_var(item, "call")
15:27:38          sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
15:27:38          try:
15:27:38  >           item.runtest()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:122: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <Function 'test_xgboost_covtype'>
15:27:38  
15:27:38      def runtest(self):
15:27:38          """ execute the underlying test function. """
15:27:38  >       self.ihook.pytest_pyfunc_call(pyfuncitem=self)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:1438: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
15:27:38  kwargs = {'pyfuncitem': <Function 'test_xgboost_covtype'>}, notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_xgboost_covtype'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_xgboost_covtype'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  pyfuncitem = <Function 'test_xgboost_covtype'>
15:27:38  
15:27:38      @hookimpl(trylast=True)
15:27:38      def pytest_pyfunc_call(pyfuncitem):
15:27:38          testfunction = pyfuncitem.obj
15:27:38          if pyfuncitem._isyieldedfunction():
15:27:38              testfunction(*pyfuncitem._args)
15:27:38          else:
15:27:38              funcargs = pyfuncitem.funcargs
15:27:38              testargs = {}
15:27:38              for arg in pyfuncitem._fixtureinfo.argnames:
15:27:38                  testargs[arg] = funcargs[arg]
15:27:38  >           testfunction(**testargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:166: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >   def test_xgboost_covtype(): fun()
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgboost.py:65: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38      def fun():
15:27:38          import xgboost as xgb
15:27:38          import numpy as np
15:27:38          from sklearn.datasets import fetch_covtype
15:27:38          from sklearn.model_selection import train_test_split
15:27:38          import time
15:27:38      
15:27:38          # Fetch dataset using sklearn
15:27:38          cov = fetch_covtype()
15:27:38          X = cov.data
15:27:38          y = cov.target
15:27:38      
15:27:38          # Create 0.75/0.25 train/test split
15:27:38          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, train_size=0.75,
15:27:38                                                              random_state=42)
15:27:38      
15:27:38          # Specify sufficient boosting iterations to reach a minimum
15:27:38          num_round = 10
15:27:38      
15:27:38          # Leave most parameters as default
15:27:38          param = {'objective': 'multi:softmax',  # Specify multiclass classification
15:27:38                   'num_class': 8,  # Number of possible output classes
15:27:38                   'tree_method': 'gpu_hist',  # Use GPU accelerated algorithm
15:27:38                   # TODO: workaround, remove it when xgboost is fixes
15:27:38                   'n_gpus': -1,
15:27:38                   }
15:27:38      
15:27:38          # Convert input data from numpy to XGBoost format
15:27:38          dtrain = xgb.DMatrix(X_train, label=y_train, nthread=-1)
15:27:38          dtest = xgb.DMatrix(X_test, label=y_test, nthread=-1)
15:27:38      
15:27:38          gpu_res = {}  # Store accuracy result
15:27:38          tmp = time.time()
15:27:38          # Train model
15:27:38          xgb.train(param, dtrain, num_round, evals=[
15:27:38  >                 (dtest, 'test')], evals_result=gpu_res)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgboost.py:53: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'n_gpus': -1, 'num_class': 8, 'objective': 'multi:softmax', 'tree_method': 'gpu_hist'}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff937aeb38>, num_boost_round = 10
15:27:38  evals = [(<xgboost.core.DMatrix object at 0x7fff708449b0>, 'test')], obj = None
15:27:38  feval = None, maximize = False, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, evals_result = {}
15:27:38  verbose_eval = True, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff5e99f378>, <function record_evaluation.<locals>.callback at 0x7fff5e99f488>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'n_gpus': -1, 'num_class': 8, 'objective': 'multi:softmax', 'tree_method': 'gpu_hist'}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff937aeb38>, num_boost_round = 10
15:27:38  evals = [(<xgboost.core.DMatrix object at 0x7fff708449b0>, 'test')], obj = None
15:27:38  feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff5e99f378>, <function record_evaluation.<locals>.callback at 0x7fff5e99f488>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff708864e0>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff937aeb38>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [12:57:47] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff7c9cf984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff7cbeb2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff7cbeb358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff7ca4df38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0x3f4) [0x7fff7ca4ea94]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff7ca60db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff7c9dadf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fff9be7928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  ------------------------------ Captured log call -------------------------------
15:27:38  covtype.py                 111 INFO     Downloading https://ndownloader.figshare.com/files/5976039
15:27:38  _________________________ test_sklearn_gbm_regression __________________________
15:27:38  [gw2] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <CallInfo when='call' exception: [13:05:11] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_his...8c]
15:27:38    [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fffaf1e6df4]
15:27:38  
15:27:38  >
15:27:38  func = <function call_runtest_hook.<locals>.<lambda> at 0x7fff763f8950>
15:27:38  when = 'call', treat_keyboard_interrupt_as_exception = False
15:27:38  
15:27:38      def __init__(self, func, when, treat_keyboard_interrupt_as_exception=False):
15:27:38          #: context of invocation: one of "setup", "call",
15:27:38          #: "teardown", "memocollect"
15:27:38          self.when = when
15:27:38          self.start = time()
15:27:38          try:
15:27:38  >           self.result = func()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >       lambda: ihook(item=item, **kwds),
15:27:38          when=when,
15:27:38          treat_keyboard_interrupt_as_exception=item.config.getvalue("usepdb"),
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:194: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_runtest_call'>, args = ()
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_regression'>}, notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fffaf1c2e10>
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ffae5a0ba8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fffae56f240>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_regression'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ffae5a0ba8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fffae56f240>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_gbm_regression'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  item = <Function 'test_sklearn_gbm_regression'>
15:27:38  
15:27:38      def pytest_runtest_call(item):
15:27:38          _update_current_test_var(item, "call")
15:27:38          sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
15:27:38          try:
15:27:38  >           item.runtest()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:122: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <Function 'test_sklearn_gbm_regression'>
15:27:38  
15:27:38      def runtest(self):
15:27:38          """ execute the underlying test function. """
15:27:38  >       self.ihook.pytest_pyfunc_call(pyfuncitem=self)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:1438: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_regression'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fffaf1c2e10>
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_regression'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_gbm_regression'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  pyfuncitem = <Function 'test_sklearn_gbm_regression'>
15:27:38  
15:27:38      @hookimpl(trylast=True)
15:27:38      def pytest_pyfunc_call(pyfuncitem):
15:27:38          testfunction = pyfuncitem.obj
15:27:38          if pyfuncitem._isyieldedfunction():
15:27:38              testfunction(*pyfuncitem._args)
15:27:38          else:
15:27:38              funcargs = pyfuncitem.funcargs
15:27:38              testargs = {}
15:27:38              for arg in pyfuncitem._fixtureinfo.argnames:
15:27:38                  testargs[arg] = funcargs[arg]
15:27:38  >           testfunction(**testargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:166: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >   def test_sklearn_gbm_regression(): test_gbm_regressor_backupsklearn()
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:241: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  backend = 'auto'
15:27:38  
15:27:38      def test_gbm_regressor_backupsklearn(backend='auto'):
15:27:38          df = pd.read_csv("./open_data/simple.txt", delim_whitespace=True)
15:27:38          X = np.array(df.iloc[:, :df.shape[1] - 1], dtype='float32', order='C')
15:27:38          y = np.array(df.iloc[:, df.shape[1] - 1], dtype='float32', order='C')
15:27:38          import h2o4gpu
15:27:38          Solver = h2o4gpu.GradientBoostingRegressor
15:27:38      
15:27:38          #Run h2o4gpu version of RandomForest Regression
15:27:38          gbm = Solver(backend=backend, random_state=1234)
15:27:38          print("h2o4gpu fit()")
15:27:38  >       gbm.fit(X, y)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:136: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <h2o4gpu.solvers.xgboost.GradientBoostingRegressor object at 0x7fff764e2e48>
15:27:38  X = array([[ 8.  ,  0.45,  0.  ,  1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  0.  ,
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.99,  1.  ,  1...
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.88,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
15:27:38           1.  ]], dtype=float32)
15:27:38  y = array([1., 0., 1., 0., 0., 0., 1., 1.], dtype=float32), sample_weight = None
15:27:38  
15:27:38      def fit(self, X, y=None, sample_weight=None):
15:27:38  >       res = self.model.fit(X, y, sample_weight)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/h2o4gpu/solvers/xgboost.py:1541: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
15:27:38         colsample_bynode=1, colsample_bytree=1.0, g...lambda=1, scale_pos_weight=1,
15:27:38         seed=None, silent=True, subsample=1.0, tree_method='gpu_hist',
15:27:38         verbosity=1)
15:27:38  X = array([[ 8.  ,  0.45,  0.  ,  1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  0.  ,
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.99,  1.  ,  1...
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.88,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
15:27:38           1.  ]], dtype=float32)
15:27:38  y = array([1., 0., 1., 0., 0., 0., 1., 1.], dtype=float32), sample_weight = None
15:27:38  eval_set = None, eval_metric = None, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, verbose = True
15:27:38  xgb_model = None, sample_weight_eval_set = None, callbacks = None
15:27:38  
15:27:38          def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
15:27:38                  early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None, verbose=True, xgb_model=None,
15:27:38                  sample_weight_eval_set=None, callbacks=None):
15:27:38              # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init
15:27:38              """
15:27:38              Fit the gradient boosting model
15:27:38      
15:27:38              Parameters
15:27:38              ----------
15:27:38              X : array_like
15:27:38                  Feature matrix
15:27:38              y : array_like
15:27:38                  Labels
15:27:38              sample_weight : array_like
15:27:38                  instance weights
15:27:38              eval_set : list, optional
15:27:38                  A list of (X, y) tuple pairs to use as a validation set for
15:27:38                  early-stopping
15:27:38              sample_weight_eval_set : list, optional
15:27:38                  A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
15:27:38                  instance weights on the i-th validation set.
15:27:38              eval_metric : str, callable, optional
15:27:38                  If a str, should be a built-in evaluation metric to use. See
15:27:38                  doc/parameter.rst. If callable, a custom evaluation metric. The call
15:27:38                  signature is func(y_predicted, y_true) where y_true will be a
15:27:38                  DMatrix object such that you may need to call the get_label
15:27:38                  method. It must return a str, value pair where the str is a name
15:27:38                  for the evaluation and value is the value of the evaluation
15:27:38                  function. This objective is always minimized.
15:27:38              early_stopping_rounds : int
15:27:38                  Activates early stopping. Validation error needs to decrease at
15:27:38                  least every <early_stopping_rounds> round(s) to continue training.
15:27:38                  Requires at least one item in evals.  If there's more than one,
15:27:38                  will use the last. Returns the model from the last iteration
15:27:38                  (not the best one). If early stopping occurs, the model will
15:27:38                  have three additional fields: bst.best_score, bst.best_iteration
15:27:38                  and bst.best_ntree_limit.
15:27:38                  (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
15:27:38                  and/or num_class appears in the parameters)
15:27:38              early_stopping_threshold : float
15:27:38                Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38                more than threshold times the score from early_stopping_rounds before,
15:27:38                  then the learning stops.
15:27:38              early_stopping_limit: float
15:27:38                  Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38                  to value of limit.
15:27:38              verbose : bool
15:27:38                  If `verbose` and an evaluation set is used, writes the evaluation
15:27:38                  metric measured on the validation set to stderr.
15:27:38              xgb_model : str
15:27:38                  file name of stored xgb model or 'Booster' instance Xgb model to be
15:27:38                  loaded before training (allows training continuation).
15:27:38              callbacks : list of callback functions
15:27:38                  List of callback functions that are applied at end of each iteration.
15:27:38                  It is possible to use predefined callbacks by using :ref:`callback_api`.
15:27:38                  Example:
15:27:38      
15:27:38                  .. code-block:: python
15:27:38      
15:27:38                      [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38              """
15:27:38              if sample_weight is not None:
15:27:38                  trainDmatrix = DMatrix(X, label=y, weight=sample_weight,
15:27:38                                         missing=self.missing, nthread=self.n_jobs)
15:27:38              else:
15:27:38                  trainDmatrix = DMatrix(X, label=y, missing=self.missing, nthread=self.n_jobs)
15:27:38      
15:27:38              evals_result = {}
15:27:38      
15:27:38              if eval_set is not None:
15:27:38                  if sample_weight_eval_set is None:
15:27:38                      sample_weight_eval_set = [None] * len(eval_set)
15:27:38                  evals = list(
15:27:38                      DMatrix(eval_set[i][0], label=eval_set[i][1], missing=self.missing,
15:27:38                              weight=sample_weight_eval_set[i], nthread=self.n_jobs)
15:27:38                      for i in range(len(eval_set)))
15:27:38                  evals = list(zip(evals, ["validation_{}".format(i) for i in
15:27:38                                           range(len(evals))]))
15:27:38              else:
15:27:38                  evals = ()
15:27:38      
15:27:38              params = self.get_xgb_params()
15:27:38      
15:27:38              if callable(self.objective):
15:27:38                  obj = _objective_decorator(self.objective)
15:27:38                  params["objective"] = "reg:linear"
15:27:38              else:
15:27:38                  obj = None
15:27:38      
15:27:38              feval = eval_metric if callable(eval_metric) else None
15:27:38              if eval_metric is not None:
15:27:38                  if callable(eval_metric):
15:27:38                      eval_metric = None
15:27:38                  else:
15:27:38                      params.update({'eval_metric': eval_metric})
15:27:38      
15:27:38              self._Booster = train(params, trainDmatrix,
15:27:38                                    self.get_num_boosting_rounds(), evals=evals,
15:27:38                                    early_stopping_rounds=early_stopping_rounds,
15:27:38                                    early_stopping_threshold=early_stopping_threshold,
15:27:38                                    early_stopping_limit=early_stopping_limit,
15:27:38                                    evals_result=evals_result, obj=obj, feval=feval,
15:27:38                                    verbose_eval=verbose, xgb_model=xgb_model,
15:27:38  >                                 callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:406: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff76409f60>, num_boost_round = 100
15:27:38  evals = (), obj = None, feval = None, maximize = False
15:27:38  early_stopping_rounds = None, early_stopping_threshold = None
15:27:38  early_stopping_limit = None, evals_result = {}, verbose_eval = True
15:27:38  xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff76202510>, <function record_evaluation.<locals>.callback at 0x7fff76202620>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff76409f60>, num_boost_round = 100
15:27:38  evals = [], obj = None, feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff76202510>, <function record_evaluation.<locals>.callback at 0x7fff76202620>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff761eeda0>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff76409f60>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [13:05:11] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff8bd2f984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff8bf4b2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff8bf4b358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff8bdadf38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xa78) [0x7fff8bdaf118]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff8bdc0db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff8bd3adf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fffaf1e928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fffaf1e6df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  _______________________ test_gbm_regressor_backupsklearn _______________________
15:27:38  [gw3] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <CallInfo when='call' exception: [13:09:20] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_his...8c]
15:27:38    [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fffb9b06df4]
15:27:38  
15:27:38  >
15:27:38  func = <function call_runtest_hook.<locals>.<lambda> at 0x7fffb8c1dd90>
15:27:38  when = 'call', treat_keyboard_interrupt_as_exception = False
15:27:38  
15:27:38      def __init__(self, func, when, treat_keyboard_interrupt_as_exception=False):
15:27:38          #: context of invocation: one of "setup", "call",
15:27:38          #: "teardown", "memocollect"
15:27:38          self.when = when
15:27:38          self.start = time()
15:27:38          try:
15:27:38  >           self.result = func()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >       lambda: ihook(item=item, **kwds),
15:27:38          when=when,
15:27:38          treat_keyboard_interrupt_as_exception=item.config.getvalue("usepdb"),
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:194: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_runtest_call'>, args = ()
15:27:38  kwargs = {'item': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fffb9abfe10>
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ffb8e34eb8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fffb8c20f28>>]
15:27:38  kwargs = {'item': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ffb8e34eb8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fffb8c20f28>>]
15:27:38  kwargs = {'item': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  item = <Function 'test_gbm_regressor_backupsklearn'>
15:27:38  
15:27:38      def pytest_runtest_call(item):
15:27:38          _update_current_test_var(item, "call")
15:27:38          sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
15:27:38          try:
15:27:38  >           item.runtest()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:122: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <Function 'test_gbm_regressor_backupsklearn'>
15:27:38  
15:27:38      def runtest(self):
15:27:38          """ execute the underlying test function. """
15:27:38  >       self.ihook.pytest_pyfunc_call(pyfuncitem=self)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:1438: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
15:27:38  kwargs = {'pyfuncitem': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fffb9abfe10>
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_gbm_regressor_backupsklearn'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  pyfuncitem = <Function 'test_gbm_regressor_backupsklearn'>
15:27:38  
15:27:38      @hookimpl(trylast=True)
15:27:38      def pytest_pyfunc_call(pyfuncitem):
15:27:38          testfunction = pyfuncitem.obj
15:27:38          if pyfuncitem._isyieldedfunction():
15:27:38              testfunction(*pyfuncitem._args)
15:27:38          else:
15:27:38              funcargs = pyfuncitem.funcargs
15:27:38              testargs = {}
15:27:38              for arg in pyfuncitem._fixtureinfo.argnames:
15:27:38                  testargs[arg] = funcargs[arg]
15:27:38  >           testfunction(**testargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:166: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  backend = 'auto'
15:27:38  
15:27:38      def test_gbm_regressor_backupsklearn(backend='auto'):
15:27:38          df = pd.read_csv("./open_data/simple.txt", delim_whitespace=True)
15:27:38          X = np.array(df.iloc[:, :df.shape[1] - 1], dtype='float32', order='C')
15:27:38          y = np.array(df.iloc[:, df.shape[1] - 1], dtype='float32', order='C')
15:27:38          import h2o4gpu
15:27:38          Solver = h2o4gpu.GradientBoostingRegressor
15:27:38      
15:27:38          #Run h2o4gpu version of RandomForest Regression
15:27:38          gbm = Solver(backend=backend, random_state=1234)
15:27:38          print("h2o4gpu fit()")
15:27:38  >       gbm.fit(X, y)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:136: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <h2o4gpu.solvers.xgboost.GradientBoostingRegressor object at 0x7fff81057f98>
15:27:38  X = array([[ 8.  ,  0.45,  0.  ,  1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  0.  ,
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.99,  1.  ,  1...
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.88,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
15:27:38           1.  ]], dtype=float32)
15:27:38  y = array([1., 0., 1., 0., 0., 0., 1., 1.], dtype=float32), sample_weight = None
15:27:38  
15:27:38      def fit(self, X, y=None, sample_weight=None):
15:27:38  >       res = self.model.fit(X, y, sample_weight)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/h2o4gpu/solvers/xgboost.py:1541: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
15:27:38         colsample_bynode=1, colsample_bytree=1.0, g...lambda=1, scale_pos_weight=1,
15:27:38         seed=None, silent=True, subsample=1.0, tree_method='gpu_hist',
15:27:38         verbosity=1)
15:27:38  X = array([[ 8.  ,  0.45,  0.  ,  1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  0.  ,
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.99,  1.  ,  1...
15:27:38           0.  ],
15:27:38         [ 7.  ,  0.88,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
15:27:38           1.  ]], dtype=float32)
15:27:38  y = array([1., 0., 1., 0., 0., 0., 1., 1.], dtype=float32), sample_weight = None
15:27:38  eval_set = None, eval_metric = None, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, verbose = True
15:27:38  xgb_model = None, sample_weight_eval_set = None, callbacks = None
15:27:38  
15:27:38          def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
15:27:38                  early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None, verbose=True, xgb_model=None,
15:27:38                  sample_weight_eval_set=None, callbacks=None):
15:27:38              # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init
15:27:38              """
15:27:38              Fit the gradient boosting model
15:27:38      
15:27:38              Parameters
15:27:38              ----------
15:27:38              X : array_like
15:27:38                  Feature matrix
15:27:38              y : array_like
15:27:38                  Labels
15:27:38              sample_weight : array_like
15:27:38                  instance weights
15:27:38              eval_set : list, optional
15:27:38                  A list of (X, y) tuple pairs to use as a validation set for
15:27:38                  early-stopping
15:27:38              sample_weight_eval_set : list, optional
15:27:38                  A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
15:27:38                  instance weights on the i-th validation set.
15:27:38              eval_metric : str, callable, optional
15:27:38                  If a str, should be a built-in evaluation metric to use. See
15:27:38                  doc/parameter.rst. If callable, a custom evaluation metric. The call
15:27:38                  signature is func(y_predicted, y_true) where y_true will be a
15:27:38                  DMatrix object such that you may need to call the get_label
15:27:38                  method. It must return a str, value pair where the str is a name
15:27:38                  for the evaluation and value is the value of the evaluation
15:27:38                  function. This objective is always minimized.
15:27:38              early_stopping_rounds : int
15:27:38                  Activates early stopping. Validation error needs to decrease at
15:27:38                  least every <early_stopping_rounds> round(s) to continue training.
15:27:38                  Requires at least one item in evals.  If there's more than one,
15:27:38                  will use the last. Returns the model from the last iteration
15:27:38                  (not the best one). If early stopping occurs, the model will
15:27:38                  have three additional fields: bst.best_score, bst.best_iteration
15:27:38                  and bst.best_ntree_limit.
15:27:38                  (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
15:27:38                  and/or num_class appears in the parameters)
15:27:38              early_stopping_threshold : float
15:27:38                Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38                more than threshold times the score from early_stopping_rounds before,
15:27:38                  then the learning stops.
15:27:38              early_stopping_limit: float
15:27:38                  Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38                  to value of limit.
15:27:38              verbose : bool
15:27:38                  If `verbose` and an evaluation set is used, writes the evaluation
15:27:38                  metric measured on the validation set to stderr.
15:27:38              xgb_model : str
15:27:38                  file name of stored xgb model or 'Booster' instance Xgb model to be
15:27:38                  loaded before training (allows training continuation).
15:27:38              callbacks : list of callback functions
15:27:38                  List of callback functions that are applied at end of each iteration.
15:27:38                  It is possible to use predefined callbacks by using :ref:`callback_api`.
15:27:38                  Example:
15:27:38      
15:27:38                  .. code-block:: python
15:27:38      
15:27:38                      [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38              """
15:27:38              if sample_weight is not None:
15:27:38                  trainDmatrix = DMatrix(X, label=y, weight=sample_weight,
15:27:38                                         missing=self.missing, nthread=self.n_jobs)
15:27:38              else:
15:27:38                  trainDmatrix = DMatrix(X, label=y, missing=self.missing, nthread=self.n_jobs)
15:27:38      
15:27:38              evals_result = {}
15:27:38      
15:27:38              if eval_set is not None:
15:27:38                  if sample_weight_eval_set is None:
15:27:38                      sample_weight_eval_set = [None] * len(eval_set)
15:27:38                  evals = list(
15:27:38                      DMatrix(eval_set[i][0], label=eval_set[i][1], missing=self.missing,
15:27:38                              weight=sample_weight_eval_set[i], nthread=self.n_jobs)
15:27:38                      for i in range(len(eval_set)))
15:27:38                  evals = list(zip(evals, ["validation_{}".format(i) for i in
15:27:38                                           range(len(evals))]))
15:27:38              else:
15:27:38                  evals = ()
15:27:38      
15:27:38              params = self.get_xgb_params()
15:27:38      
15:27:38              if callable(self.objective):
15:27:38                  obj = _objective_decorator(self.objective)
15:27:38                  params["objective"] = "reg:linear"
15:27:38              else:
15:27:38                  obj = None
15:27:38      
15:27:38              feval = eval_metric if callable(eval_metric) else None
15:27:38              if eval_metric is not None:
15:27:38                  if callable(eval_metric):
15:27:38                      eval_metric = None
15:27:38                  else:
15:27:38                      params.update({'eval_metric': eval_metric})
15:27:38      
15:27:38              self._Booster = train(params, trainDmatrix,
15:27:38                                    self.get_num_boosting_rounds(), evals=evals,
15:27:38                                    early_stopping_rounds=early_stopping_rounds,
15:27:38                                    early_stopping_threshold=early_stopping_threshold,
15:27:38                                    early_stopping_limit=early_stopping_limit,
15:27:38                                    evals_result=evals_result, obj=obj, feval=feval,
15:27:38                                    verbose_eval=verbose, xgb_model=xgb_model,
15:27:38  >                                 callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:406: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff81024390>, num_boost_round = 100
15:27:38  evals = (), obj = None, feval = None, maximize = False
15:27:38  early_stopping_rounds = None, early_stopping_threshold = None
15:27:38  early_stopping_limit = None, evals_result = {}, verbose_eval = True
15:27:38  xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff803f8ea0>, <function record_evaluation.<locals>.callback at 0x7fff803fe048>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff81024390>, num_boost_round = 100
15:27:38  evals = [], obj = None, feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff803f8ea0>, <function record_evaluation.<locals>.callback at 0x7fff803fe048>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff805efe80>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff81024390>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [13:09:20] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff9664f984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff9686b2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff9686b358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff966cdf38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xa78) [0x7fff966cf118]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff966e0db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff9665adf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fffb9b0928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fffb9b06df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  ____________ tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py ____________
15:27:38  [gw4] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  Worker 'gw4' crashed while running 'tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_gbm_classifier_backupsklearn'
15:27:38  _____________________ test_sklearn_drf_regression_h2o4gpu ______________________
15:27:38  [gw0] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  
15:27:38  self = <CallInfo when='call' exception: [13:24:10] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_his...8c]
15:27:38    [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  >
15:27:38  func = <function call_runtest_hook.<locals>.<lambda> at 0x7fff50376a60>
15:27:38  when = 'call', treat_keyboard_interrupt_as_exception = False
15:27:38  
15:27:38      def __init__(self, func, when, treat_keyboard_interrupt_as_exception=False):
15:27:38          #: context of invocation: one of "setup", "call",
15:27:38          #: "teardown", "memocollect"
15:27:38          self.when = when
15:27:38          self.start = time()
15:27:38          try:
15:27:38  >           self.result = func()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:212: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >       lambda: ihook(item=item, **kwds),
15:27:38          when=when,
15:27:38          treat_keyboard_interrupt_as_exception=item.config.getvalue("usepdb"),
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:194: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_runtest_call'>, args = ()
15:27:38  kwargs = {'item': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_runtest_call'>
15:27:38  methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa...ff9b1c50b8>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7fff9af97e80>>]
15:27:38  kwargs = {'item': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  item = <Function 'test_sklearn_drf_regression_h2o4gpu'>
15:27:38  
15:27:38      def pytest_runtest_call(item):
15:27:38          _update_current_test_var(item, "call")
15:27:38          sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
15:27:38          try:
15:27:38  >           item.runtest()
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/runner.py:122: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <Function 'test_sklearn_drf_regression_h2o4gpu'>
15:27:38  
15:27:38      def runtest(self):
15:27:38          """ execute the underlying test function. """
15:27:38  >       self.ihook.pytest_pyfunc_call(pyfuncitem=self)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:1438: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  notincall = set()
15:27:38  
15:27:38      def __call__(self, *args, **kwargs):
15:27:38          if args:
15:27:38              raise TypeError("hook calling supports only keyword arguments")
15:27:38          assert not self.is_historic()
15:27:38          if self.spec and self.spec.argnames:
15:27:38              notincall = (
15:27:38                  set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
15:27:38              )
15:27:38              if notincall:
15:27:38                  warnings.warn(
15:27:38                      "Argument(s) {} which are declared in the hookspec "
15:27:38                      "can not be found in this hook call".format(tuple(notincall)),
15:27:38                      stacklevel=2,
15:27:38                  )
15:27:38  >       return self._hookexec(self, self.get_hookimpls(), kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/hooks.py:289: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <_pytest.config.PytestPluginManager object at 0x7fff9be52e10>
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  
15:27:38      def _hookexec(self, hook, methods, kwargs):
15:27:38          # called from all hookcaller instances.
15:27:38          # enable_tracing will set its own wrapping function at self._inner_hookexec
15:27:38  >       return self._inner_hookexec(hook, methods, kwargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:68: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  hook = <_HookCaller 'pytest_pyfunc_call'>
15:27:38  methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-pa..., plugin=<module '_pytest.skipping' from '/opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/skipping.py'>>]
15:27:38  kwargs = {'pyfuncitem': <Function 'test_sklearn_drf_regression_h2o4gpu'>}
15:27:38  
15:27:38      self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
15:27:38          methods,
15:27:38          kwargs,
15:27:38  >       firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
15:27:38      )
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/pluggy/manager.py:62: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  pyfuncitem = <Function 'test_sklearn_drf_regression_h2o4gpu'>
15:27:38  
15:27:38      @hookimpl(trylast=True)
15:27:38      def pytest_pyfunc_call(pyfuncitem):
15:27:38          testfunction = pyfuncitem.obj
15:27:38          if pyfuncitem._isyieldedfunction():
15:27:38              testfunction(*pyfuncitem._args)
15:27:38          else:
15:27:38              funcargs = pyfuncitem.funcargs
15:27:38              testargs = {}
15:27:38              for arg in pyfuncitem._fixtureinfo.argnames:
15:27:38                  testargs[arg] = funcargs[arg]
15:27:38  >           testfunction(**testargs)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/_pytest/python.py:166: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  >   def test_sklearn_drf_regression_h2o4gpu(): test_drf_classifier_backupsklearn(backend='h2o4gpu')
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:235: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  backend = 'h2o4gpu'
15:27:38  
15:27:38      def test_drf_classifier_backupsklearn(backend='auto'):
15:27:38          df = pd.read_csv("./open_data/creditcard.csv")
15:27:38          X = np.array(df.iloc[:, :df.shape[1] - 1], dtype='float32', order='C')
15:27:38          y = np.array(df.iloc[:, df.shape[1] - 1], dtype='float32', order='C')
15:27:38          import h2o4gpu
15:27:38          Solver = h2o4gpu.RandomForestClassifier
15:27:38      
15:27:38          #Run h2o4gpu version of RandomForest Regression
15:27:38          drf = Solver(backend=backend, random_state=1234, oob_score=True, n_estimators=10)
15:27:38          print("h2o4gpu fit()")
15:27:38  >       drf.fit(X, y)
15:27:38  
15:27:38  tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py:75: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <h2o4gpu.solvers.xgboost.RandomForestClassifier object at 0x7fff5ea88940>
15:27:38  X = array([[1.0000e+00, 2.0000e+04, 2.0000e+00, ..., 0.0000e+00, 0.0000e+00,
15:27:38          0.0000e+00],
15:27:38         [2.0000e+00, 1.20...000e+03],
15:27:38         [2.3999e+04, 2.0000e+04, 1.0000e+00, ..., 1.0000e+03, 0.0000e+00,
15:27:38          0.0000e+00]], dtype=float32)
15:27:38  y = array([1., 1., 0., ..., 0., 0., 0.], dtype=float32), sample_weight = None
15:27:38  
15:27:38      def fit(self, X, y=None, sample_weight=None):
15:27:38  >       res = self.model.fit(X, y, sample_weight)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/h2o4gpu/solvers/xgboost.py:317: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
15:27:38         colsample_bynode=1, colsample_bytree=1.0, ...lambda=1, scale_pos_weight=1,
15:27:38         seed=None, silent=True, subsample=1.0, tree_method='gpu_hist',
15:27:38         verbosity=1)
15:27:38  X = array([[1.0000e+00, 2.0000e+04, 2.0000e+00, ..., 0.0000e+00, 0.0000e+00,
15:27:38          0.0000e+00],
15:27:38         [2.0000e+00, 1.20...000e+03],
15:27:38         [2.3999e+04, 2.0000e+04, 1.0000e+00, ..., 1.0000e+03, 0.0000e+00,
15:27:38          0.0000e+00]], dtype=float32)
15:27:38  y = array([1., 1., 0., ..., 0., 0., 0.], dtype=float32), sample_weight = None
15:27:38  eval_set = None, eval_metric = None, early_stopping_rounds = None
15:27:38  early_stopping_threshold = None, early_stopping_limit = None, verbose = True
15:27:38  xgb_model = None, sample_weight_eval_set = None, callbacks = None
15:27:38  
15:27:38          def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
15:27:38                  early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None,
15:27:38                                        verbose=True, xgb_model=None,
15:27:38                  sample_weight_eval_set=None, callbacks=None):
15:27:38              # pylint: disable = attribute-defined-outside-init,arguments-differ
15:27:38              """
15:27:38              Fit gradient boosting classifier
15:27:38      
15:27:38              Parameters
15:27:38              ----------
15:27:38              X : array_like
15:27:38                  Feature matrix
15:27:38              y : array_like
15:27:38                  Labels
15:27:38              sample_weight : array_like
15:27:38                  Weight for each instance
15:27:38              eval_set : list, optional
15:27:38                  A list of (X, y) pairs to use as a validation set for
15:27:38                  early-stopping
15:27:38              sample_weight_eval_set : list, optional
15:27:38                  A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
15:27:38                  instance weights on the i-th validation set.
15:27:38              eval_metric : str, callable, optional
15:27:38                  If a str, should be a built-in evaluation metric to use. See
15:27:38                  doc/parameter.rst. If callable, a custom evaluation metric. The call
15:27:38                  signature is func(y_predicted, y_true) where y_true will be a
15:27:38                  DMatrix object such that you may need to call the get_label
15:27:38                  method. It must return a str, value pair where the str is a name
15:27:38                  for the evaluation and value is the value of the evaluation
15:27:38                  function. This objective is always minimized.
15:27:38              early_stopping_rounds : int, optional
15:27:38                  Activates early stopping. Validation error needs to decrease at
15:27:38                  least every <early_stopping_rounds> round(s) to continue training.
15:27:38                  Requires at least one item in evals. If there's more than one,
15:27:38                  will use the last. If early stopping occurs, the model will have
15:27:38                  three additional fields: bst.best_score, bst.best_iteration and
15:27:38                  bst.best_ntree_limit (bst.best_ntree_limit is the ntree_limit parameter
15:27:38                  default value in predict method if not any other value is specified).
15:27:38                  (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
15:27:38                  and/or num_class appears in the parameters)
15:27:38              early_stopping_threshold : float
15:27:38                Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38                more than threshold times the score from early_stopping_rounds before,
15:27:38                  then the learning stops.
15:27:38              early_stopping_limit: float
15:27:38                  Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38                  to value of limit.
15:27:38              verbose : bool
15:27:38                  If `verbose` and an evaluation set is used, writes the evaluation
15:27:38                  metric measured on the validation set to stderr.
15:27:38              xgb_model : str
15:27:38                  file name of stored xgb model or 'Booster' instance Xgb model to be
15:27:38                  loaded before training (allows training continuation).
15:27:38              callbacks : list of callback functions
15:27:38                  List of callback functions that are applied at end of each iteration.
15:27:38                  It is possible to use predefined callbacks by using :ref:`callback_api`.
15:27:38                  Example:
15:27:38      
15:27:38                  .. code-block:: python
15:27:38      
15:27:38                      [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38              """
15:27:38              evals_result = {}
15:27:38              self.classes_ = np.unique(y)
15:27:38              self.n_classes_ = len(self.classes_)
15:27:38      
15:27:38              xgb_options = self.get_xgb_params()
15:27:38      
15:27:38              if callable(self.objective):
15:27:38                  obj = _objective_decorator(self.objective)
15:27:38                  # Use default value. Is it really not used ?
15:27:38                  xgb_options["objective"] = "binary:logistic"
15:27:38              else:
15:27:38                  obj = None
15:27:38      
15:27:38              if self.n_classes_ > 2:
15:27:38                  # Switch to using a multiclass objective in the underlying XGB instance
15:27:38                  xgb_options["objective"] = "multi:softprob"
15:27:38                  xgb_options['num_class'] = self.n_classes_
15:27:38      
15:27:38              feval = eval_metric if callable(eval_metric) else None
15:27:38              if eval_metric is not None:
15:27:38                  if callable(eval_metric):
15:27:38                      eval_metric = None
15:27:38                  else:
15:27:38                      xgb_options.update({"eval_metric": eval_metric})
15:27:38      
15:27:38              self._le = XGBLabelEncoder().fit(y)
15:27:38              training_labels = self._le.transform(y)
15:27:38      
15:27:38              if eval_set is not None:
15:27:38                  if sample_weight_eval_set is None:
15:27:38                      sample_weight_eval_set = [None] * len(eval_set)
15:27:38                  evals = list(
15:27:38                      DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
15:27:38                              missing=self.missing, weight=sample_weight_eval_set[i],
15:27:38                              nthread=self.n_jobs)
15:27:38                      for i in range(len(eval_set))
15:27:38                  )
15:27:38                  nevals = len(evals)
15:27:38                  eval_names = ["validation_{}".format(i) for i in range(nevals)]
15:27:38                  evals = list(zip(evals, eval_names))
15:27:38              else:
15:27:38                  evals = ()
15:27:38      
15:27:38              self._features_count = X.shape[1]
15:27:38      
15:27:38              if sample_weight is not None:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels, weight=sample_weight,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38              else:
15:27:38                  train_dmatrix = DMatrix(X, label=training_labels,
15:27:38                                          missing=self.missing, nthread=self.n_jobs)
15:27:38      
15:27:38              self._Booster = train(xgb_options, train_dmatrix, self.get_num_boosting_rounds(),
15:27:38                                    evals=evals, early_stopping_rounds=early_stopping_rounds,
15:27:38                                    early_stopping_threshold=early_stopping_threshold,
15:27:38                                    early_stopping_limit=early_stopping_limit,
15:27:38                                    evals_result=evals_result, obj=obj, feval=feval,
15:27:38                                    verbose_eval=verbose, xgb_model=xgb_model,
15:27:38  >                                 callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:757: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5ea88518>, num_boost_round = 10
15:27:38  evals = (), obj = None, feval = None, maximize = False
15:27:38  early_stopping_rounds = None, early_stopping_threshold = None
15:27:38  early_stopping_limit = None, evals_result = {}, verbose_eval = True
15:27:38  xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff50376158>, <function record_evaluation.<locals>.callback at 0x7fff50376bf8>]
15:27:38  learning_rates = None
15:27:38  
15:27:38      def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
15:27:38                maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,
15:27:38                evals_result=None,
15:27:38                verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):
15:27:38          # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
15:27:38          """Train a booster with given parameters.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          params : dict
15:27:38              Booster params.
15:27:38          dtrain : DMatrix
15:27:38              Data to be trained.
15:27:38          num_boost_round: int
15:27:38              Number of boosting iterations.
15:27:38          evals: list of pairs (DMatrix, string)
15:27:38              List of items to be evaluated during training, this allows user to watch
15:27:38              performance on the validation set.
15:27:38          obj : function
15:27:38              Customized objective function.
15:27:38          feval : function
15:27:38              Customized evaluation function.
15:27:38          maximize : bool
15:27:38              Whether to maximize feval.
15:27:38          early_stopping_rounds: int
15:27:38              Activates early stopping. Validation error needs to decrease at least
15:27:38              every **early_stopping_rounds** round(s) to continue training.
15:27:38              Requires at least one item in **evals**.
15:27:38              If there's more than one, will use the last.
15:27:38              Returns the model from the last iteration (not the best one).
15:27:38              If early stopping occurs, the model will have three additional fields:
15:27:38              ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.
15:27:38              (Use ``bst.best_ntree_limit`` to get the correct value if
15:27:38              ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)
15:27:38          early_stopping_threshold : float
15:27:38              Sets an potional threshold to smoothen the early stopping policy.
15:27:38                 If after early_stopping_rounds iterations, the model hasn't improved
15:27:38              more than threshold times the score from early_stopping_rounds before,
15:27:38              then the learning stops.
15:27:38          early_stopping_limit: float
15:27:38              Sets limit of "threshold times the score from early_stopping_rounds_before"
15:27:38              to value of limit.
15:27:38          evals_result: dict
15:27:38              This dictionary stores the evaluation results of all the items in watchlist.
15:27:38      
15:27:38              Example: with a watchlist containing
15:27:38              ``[(dtest,'eval'), (dtrain,'train')]`` and
15:27:38              a parameter containing ``('eval_metric': 'logloss')``,
15:27:38              the **evals_result** returns
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  {'train': {'logloss': ['0.48253', '0.35953']},
15:27:38                   'eval': {'logloss': ['0.480385', '0.357756']}}
15:27:38      
15:27:38          verbose_eval : bool or int
15:27:38              Requires at least one item in **evals**.
15:27:38              If **verbose_eval** is True then the evaluation metric on the validation set is
15:27:38              printed at each boosting stage.
15:27:38              If **verbose_eval** is an integer then the evaluation metric on the validation set
15:27:38              is printed at every given **verbose_eval** boosting stage. The last boosting stage
15:27:38              / the boosting stage found by using **early_stopping_rounds** is also printed.
15:27:38              Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
15:27:38              is printed every 4 boosting stages, instead of every boosting stage.
15:27:38          learning_rates: list or function (deprecated - use callback API instead)
15:27:38              List of learning rate for each boosting round
15:27:38              or a customized function that calculates eta in terms of
15:27:38              current number of round and the total number of boosting round (e.g. yields
15:27:38              learning rate decay)
15:27:38          xgb_model : file name of stored xgb model or 'Booster' instance
15:27:38              Xgb model to be loaded before training (allows training continuation).
15:27:38          callbacks : list of callback functions
15:27:38              List of callback functions that are applied at end of each iteration.
15:27:38              It is possible to use predefined callbacks by using
15:27:38              :ref:`Callback API <callback_api>`.
15:27:38              Example:
15:27:38      
15:27:38              .. code-block:: python
15:27:38      
15:27:38                  [xgb.callback.reset_learning_rate(custom_rates)]
15:27:38      
15:27:38          Returns
15:27:38          -------
15:27:38          Booster : a trained booster model
15:27:38          """
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38      
15:27:38          # Most of legacy advanced options becomes callbacks
15:27:38          if isinstance(verbose_eval, bool) and verbose_eval:
15:27:38              callbacks.append(callback.print_evaluation())
15:27:38          else:
15:27:38              if isinstance(verbose_eval, int):
15:27:38                  callbacks.append(callback.print_evaluation(verbose_eval))
15:27:38      
15:27:38          if early_stopping_rounds is not None:
15:27:38              callbacks.append(callback.early_stop(early_stopping_rounds,
15:27:38                                                   early_stopping_threshold,
15:27:38                                                   early_stopping_limit,
15:27:38                                                   maximize=maximize,
15:27:38                                                   verbose=bool(verbose_eval)))
15:27:38          if evals_result is not None:
15:27:38              callbacks.append(callback.record_evaluation(evals_result))
15:27:38      
15:27:38          if learning_rates is not None:
15:27:38              warnings.warn("learning_rates parameter is deprecated - use callback API instead",
15:27:38                            DeprecationWarning)
15:27:38              callbacks.append(callback.reset_learning_rate(learning_rates))
15:27:38      
15:27:38          return _train_internal(params, dtrain,
15:27:38                                 num_boost_round=num_boost_round,
15:27:38                                 evals=evals,
15:27:38                                 obj=obj, feval=feval,
15:27:38  >                              xgb_model=xgb_model, callbacks=callbacks)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5ea88518>, num_boost_round = 10
15:27:38  evals = [], obj = None, feval = None, xgb_model = None
15:27:38  callbacks = [<function print_evaluation.<locals>.callback at 0x7fff50376158>, <function record_evaluation.<locals>.callback at 0x7fff50376bf8>]
15:27:38  
15:27:38      def _train_internal(params, dtrain,
15:27:38                          num_boost_round=10, evals=(),
15:27:38                          obj=None, feval=None,
15:27:38                          xgb_model=None, callbacks=None):
15:27:38          """internal training function"""
15:27:38          callbacks = [] if callbacks is None else callbacks
15:27:38          evals = list(evals)
15:27:38          if isinstance(params, dict) \
15:27:38                  and 'eval_metric' in params \
15:27:38                  and isinstance(params['eval_metric'], list):
15:27:38              params = dict((k, v) for k, v in params.items())
15:27:38              eval_metrics = params['eval_metric']
15:27:38              params.pop("eval_metric", None)
15:27:38              params = list(params.items())
15:27:38              for eval_metric in eval_metrics:
15:27:38                  params += [('eval_metric', eval_metric)]
15:27:38      
15:27:38          bst = Booster(params, [dtrain] + [d[0] for d in evals])
15:27:38          nboost = 0
15:27:38          num_parallel_tree = 1
15:27:38      
15:27:38          if xgb_model is not None:
15:27:38              if not isinstance(xgb_model, STRING_TYPES):
15:27:38                  xgb_model = xgb_model.save_raw()
15:27:38              bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
15:27:38              nboost = len(bst.get_dump())
15:27:38      
15:27:38          _params = dict(params) if isinstance(params, list) else params
15:27:38      
15:27:38          if 'num_parallel_tree' in _params:
15:27:38              num_parallel_tree = _params['num_parallel_tree']
15:27:38              nboost //= num_parallel_tree
15:27:38          if 'num_class' in _params:
15:27:38              nboost //= _params['num_class']
15:27:38      
15:27:38          # Distributed code: Load the checkpoint from rabit.
15:27:38          version = bst.load_rabit_checkpoint()
15:27:38          assert rabit.get_world_size() != 1 or version == 0
15:27:38          rank = rabit.get_rank()
15:27:38          start_iteration = int(version / 2)
15:27:38          nboost += start_iteration
15:27:38      
15:27:38          callbacks_before_iter = [
15:27:38              cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]
15:27:38          callbacks_after_iter = [
15:27:38              cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]
15:27:38      
15:27:38          for i in range(start_iteration, num_boost_round):
15:27:38              for cb in callbacks_before_iter:
15:27:38                  cb(CallbackEnv(model=bst,
15:27:38                                 cvfolds=None,
15:27:38                                 iteration=i,
15:27:38                                 begin_iteration=start_iteration,
15:27:38                                 end_iteration=num_boost_round,
15:27:38                                 rank=rank,
15:27:38                                 evaluation_result_list=None))
15:27:38              # Distributed code: need to resume to this point.
15:27:38              # Skip the first update if it is a recovery step.
15:27:38              if version % 2 == 0:
15:27:38  >               bst.update(dtrain, i, obj)
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  self = <xgboost.core.Booster object at 0x7fff5ea88f28>
15:27:38  dtrain = <xgboost.core.DMatrix object at 0x7fff5ea88518>, iteration = 0
15:27:38  fobj = None
15:27:38  
15:27:38      def update(self, dtrain, iteration, fobj=None):
15:27:38          """Update for one iteration, with objective function calculated
15:27:38          internally.  This function should not be called directly by users.
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          dtrain : DMatrix
15:27:38              Training data.
15:27:38          iteration : int
15:27:38              Current iteration number.
15:27:38          fobj : function
15:27:38              Customized objective function.
15:27:38      
15:27:38          """
15:27:38          if not isinstance(dtrain, DMatrix):
15:27:38              raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))
15:27:38          self._validate_features(dtrain)
15:27:38      
15:27:38          if fobj is None:
15:27:38              _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
15:27:38  >                                                   dtrain.handle))
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 
15:27:38  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:27:38  
15:27:38  ret = -1
15:27:38  
15:27:38      def _check_call(ret):
15:27:38          """Check the return value of C API call
15:27:38      
15:27:38          This function will raise exception when error occurs.
15:27:38          Wrap every API call with this function
15:27:38      
15:27:38          Parameters
15:27:38          ----------
15:27:38          ret : int
15:27:38              return value from API calls
15:27:38          """
15:27:38          if ret != 0:
15:27:38  >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
15:27:38  E           xgboost.core.XGBoostError: [13:24:10] /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1412: Exception in gpu_hist: NCCL failure :unhandled system error /root/repo/xgboost/src/tree/../common/device_helpers.cuh(896)
15:27:38  E           
15:27:38  E           Stack trace:
15:27:38  E             [bt] (0) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7fff7c9cf984]
15:27:38  E             [bt] (1) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x2c4) [0x7fff7cbeb2e4]
15:27:38  E             [bt] (2) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::vector<xgboost::RegTree*, std::allocator<xgboost::RegTree*> > const&)+0x28) [0x7fff7cbeb358]
15:27:38  E             [bt] (3) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >*)+0x538) [0x7fff7ca4df38]
15:27:38  E             [bt] (4) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0xa78) [0x7fff7ca4f118]
15:27:38  E             [bt] (5) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x508) [0x7fff7ca60db8]
15:27:38  E             [bt] (6) /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterUpdateOneIter+0x70) [0x7fff7c9dadf0]
15:27:38  E             [bt] (7) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(+0x928c) [0x7fff9be7928c]
15:27:38  E             [bt] (8) /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0xd4) [0x7fff9be76df4]
15:27:38  
15:27:38  /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:176: XGBoostError
15:27:38  ____________ tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py ____________
15:27:38  [gw1] linux -- Python 3.6.4 /opt/h2oai/h2o4gpu/python/bin/python
15:27:38  Worker 'gw1' crashed while running 'tests/python/open_data/gbm/test_xgb_sklearn_wrapper.py::test_sklearn_gbm_regression_h2o4gpu'
15:27:38  -------- generated xml file: /repo/build/test-reports/h2o4gpu-test.xml ---------
15:27:38  ========================== slowest 10 test durations ===========================
15:27:38  692.15s call     tests/python/open_data/gbm/test_lightgbm.py::test_lightgbm_cpu_airlines_year[1987-dart]
15:27:38  676.94s call     tests/python/open_data/gbm/test_lightgbm.py::test_lightgbm_cpu_airlines_year[1987-gbdt]
15:27:38  513.64s call     tests/python/open_data/glm/test_glm_sklearn.py::TestGlmSklearn::test_glm_sklearn_gpu_data6
15:27:38  403.38s call     tests/python/open_data/glm/test_glm_sklearn.py::TestGlmSklearn::test_glm_sklearn_gpu_data10
15:27:38  354.06s call     tests/python/open_data/system/test_freeing_memory.py::test_pca
15:27:38  269.96s call     tests/python/open_data/gbm/test_lightgbm.py::test_lightgbm_cpu[dart]
15:27:38  174.62s call     tests/python/open_data/gbm/test_xgboost.py::test_xgboost_covtype
15:27:38  151.63s call     tests/python/open_data/glm/test_elastic_net_ptr_driver.py::test_elastic_net_ptr_driver
15:27:38  116.58s call     tests/python/open_data/glm/test_elasticnet_sklearn_wrapper.py::test_sklearn_ridge
15:27:38  107.56s call     tests/python/open_data/glm/test_glm_sklearn.py::TestGlmSklearn::test_glm_sklearn_gpu_data8
15:27:38  ============= 8 failed, 178 passed, 18 skipped in 2082.03 seconds ==============

sh1ng commented 5 years ago

Also happens on x86_64 and nccl 2.4.7 but much less often

#0  0x00007fb7f36cbd47 in sched_yield () from /usr/lib64/libc.so.6
#1  0x00007fb7687a79f5 in ncclCpuBarrierOut (comm=0x7fb650000e50)
    at enqueue.cc:143
#2  ncclBarrierEnqueueWait (comm=0x7fb650000e50) at enqueue.cc:193
#3  0x00007fb7687a7f6f in ncclEnqueueCheck (info=info@entry=0x7ffd815cbe90)
    at enqueue.cc:438
#4  0x00007fb7687bbc0d in ncclAllReduce (sendbuff=0x7fb677201800, 
    recvbuff=<optimized out>, count=<optimized out>, datatype=ncclFloat32, 
    op=<optimized out>, comm=<optimized out>, stream=0x557f9f74dc70)
    at collectives/all_reduce.cc:17
#5  0x00007fb768704f71 in dh::AllReducer::AllReduceSum (
    this=this@entry=0x557fa3cb41d0, communication_group_idx=0, 
    sendbuff=0x7fb677201800, recvbuff=0x7fb677201800, count=count@entry=2)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:941
#6  0x00007fb768720842 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::InitRoot (this=this@entry=0x557fa3bd4c70, 
    p_tree=p_tree@entry=0x557fa3c86a50, 
    gpair_all=gpair_all@entry=0x557f953b9510, 
    reducer=reducer@entry=0x557fa3cb41d0, num_columns=10)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1128
#7  0x00007fb768720ef9 in xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=0x557fa3bd4c70, 
    gpair_all=0x557f953b9510, p_fmat=0x557f9f097ea0, p_tree=0x557fa3c86a50, 
---Type <return> to continue, or q <return> to quit---
    reducer=0x557fa3cb41d0)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1162
#8  0x00007fb768703b0f in operator() (shard=..., idx=<optimized out>, 
    __closure=<optimized out>)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1523
#9  void dh::ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1}>(std::vector<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, std::allocator<std::vector> >*, xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*)::{lambda(int, std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >&)#1})::{lambda()#1}::operator() ()
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1049
---Type <return> to continue, or q <return> to quit---
#10 0x00007fb768722bb4 in operator() (__closure=0x7ffd815cc6e0)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1047
#11 SaveCudaContext<dh::ExecuteIndexShards(std::vector<T>*, FunctionT) [with T = std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >; FunctionT = xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9]::__lambda3> (
    func=..., this=0x7ffd815cc6d0)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:766
#12 ExecuteIndexShards<std::unique_ptr<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> >, std::default_delete<xgboost::tree::DeviceShard<xgboost::detail::GradientPairInternal<double> > > >, xgboost::tree::GPUHistMakerSpecialised<GradientSumT>::UpdateTree(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, xgboost::RegTree*) [with GradientSumT = xgboost::detail::GradientPairInternal<double>]::__lambda9> (
    f=..., shards=0x557fa3cb4198)
    at /root/repo/xgboost/src/tree/../common/device_helpers.cuh:1042
#13 xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::UpdateTree (this=this@entry=0x557fa3cb4060, 
    gpair=gpair@entry=0x557f953b9510, p_fmat=p_fmat@entry=0x557f9f097ea0, 
    p_tree=0x557fa3cb4510)
---Type <return> to continue, or q <return> to quit---
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1520
#14 0x00007fb7687232d1 in xgboost::tree::GPUHistMakerSpecialised<xgboost::detail::GradientPairInternal<double> >::Update (this=0x557fa3cb4060, 
    gpair=0x557f953b9510, dmat=0x557f9f097ea0, trees=...)
    at /root/repo/xgboost/src/tree/updater_gpu_hist.cu:1408
#15 0x00007fb7685b0726 in xgboost::gbm::GBTree::BoostNewTrees (
    this=this@entry=0x557f953878a0, gpair=gpair@entry=0x557f953b9510, 
    p_fmat=p_fmat@entry=0x557f9f097ea0, bst_group=bst_group@entry=0, 
    ret=ret@entry=0x7ffd815ccf60) at /root/repo/xgboost/src/gbm/gbtree.cc:293
#16 0x00007fb7685b1acc in xgboost::gbm::GBTree::DoBoost (this=0x557f953878a0, 
    p_fmat=0x557f9f097ea0, in_gpair=0x557f953b9510, obj=<optimized out>)
    at /root/repo/xgboost/src/gbm/gbtree.cc:180
#17 0x00007fb7685c38d3 in xgboost::LearnerImpl::UpdateOneIter (
    this=<optimized out>, iter=<optimized out>, train=0x557f9f097ea0)
    at /root/repo/xgboost/src/learner.cc:474
#18 0x00007fb768544925 in XGBoosterUpdateOneIter (handle=0x557f9537bbd0, 
    iter=0, dtrain=0x557f95038100) at /root/repo/xgboost/src/c_api/c_api.cc:896
#19 0x00007fb7eab67ec0 in ffi_call_unix64 ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#20 0x00007fb7eab6787d in ffi_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/../../libffi.so.6
#21 0x00007fb7ead7cdee in _ctypes_callproc ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m----Type <return> to continue, or q <return> to quit---
x86_64-linux-gnu.so
#22 0x00007fb7ead7d825 in PyCFuncPtr_call ()
   from /opt/h2oai/h2o4gpu/python/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so
#23 0x0000557f915971bb in _PyObject_FastCallDict ()
#24 0x0000557f91624d3e in call_function ()
#25 0x0000557f9164919a in _PyEval_EvalFrameDefault ()
#26 0x0000557f9161d9a6 in _PyEval_EvalCodeWithName ()
#27 0x0000557f9161ea11 in fast_function ()
#28 0x0000557f91624cc5 in call_function ()
#29 0x0000557f9164919a in _PyEval_EvalFrameDefault ()
#30 0x0000557f9161d9a6 in _PyEval_EvalCodeWithName ()
#31 0x0000557f9161ea11 in fast_function ()
#32 0x0000557f91624cc5 in call_function ()
#33 0x0000557f91649eb1 in _PyEval_EvalFrameDefault ()
#34 0x0000557f9161d9a6 in _PyEval_EvalCodeWithName ()
#35 0x0000557f9161ea11 in fast_function ()
#36 0x0000557f91624cc5 in call_function ()
#37 0x0000557f91649eb1 in _PyEval_EvalFrameDefault ()
#38 0x0000557f9161ddfe in _PyEval_EvalCodeWithName ()
#39 0x0000557f9161ea11 in fast_function ()
#40 0x0000557f91624cc5 in call_function ()
#41 0x0000557f9164919a in _PyEval_EvalFrameDefault ()

h2oai / h2o4gpu

hanging on some tests, most often on ppc. #773