h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

java.lang.NullPointerException when train XGBoost model with GPU instance #8717

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

I try to train a XGBoost model with a GPU instance (P3.16xlarge), but it reports error:

OSError: Job with key $03017f000001c68bffffffff$_908b7c8320882d5230bc213e87c44498 failed with an exception: java.lang.NullPointerException
stacktrace: 
java.lang.NullPointerException
    at hex.tree.xgboost.matrix.SparseMatrixFactory$NestedArrayPointer.set(SparseMatrixFactory.java:87)
    at hex.tree.xgboost.matrix.SparseMatrixFactory$InitializeCSRMatrixFromChunkIdsMrFun.map(SparseMatrixFactory.java:166)
    at water.LocalMR.compute2(LocalMR.java:84)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.LocalMR.compute2(LocalMR.java:76)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1417)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

H2O version 3.26.0.2 features: 595

Checking whether there is an H2O instance running at http://localhost:35781 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_162"; Java(TM) SE Runtime Environment (build 1.8.0_162-b12); Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
  Starting server from /usr/local/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: workspace_wish_v18_36_gpu/h2o.train.WISH_V18_36/2019-09-27_20-53-43
  JVM stdout: /tmp/tmpfv52e8j9/h2o_hadoop_started_from_python.out
  JVM stderr: /tmp/tmpfv52e8j9/h2o_hadoop_started_from_python.err
  Server is running at http://127.0.0.1:35781
Connecting to H2O server at http://127.0.0.1:35781 ... successful.
--------------------------  ---------------------------------------------------
H2O cluster uptime:         01 secs
H2O cluster timezone:       Etc/UTC
H2O data parsing timezone:  UTC
H2O cluster version:        3.26.0.2
H2O cluster version age:    2 months
H2O cluster name:           H2O_from_python_hadoop_sp3kyr
H2O cluster total nodes:    1
H2O cluster free memory:    368 Gb
H2O cluster total cores:    64
H2O cluster allowed cores:  64
H2O cluster status:         accepting new members, healthy
H2O connection url:         http://127.0.0.1:35781
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:             3.6.8 final
--------------------------  ---------------------------------------------------
Parse progress: |█████████████████████████████████████████████████████████| 100%
xgboost Model Build progress: |██ (failed)
h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6921 Assignee: UNASSIGNED Reporter: Hongzhao Zhu State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A