h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null #13595

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Steps to reproduce :

Hadoop : hadoop jar h2odriver.jar -nodes 5 -mapperXmx 20g -output march24-4 -flow_dir hdfs:///user/neeraja/myflow-1

1) Import airlines_all.csv from hdfs 2) On Flow : mr-0xd9:54321 a) getFrames b) getFrame "airlines_all.hex" c) Click on build model assist buildModel, null, training_frame: "airlines_all.hex" d) Build GLM model : buildModel 'glm', {"destination_key":"glm-076a86b8-0340-4cf5-8899-773d57e6c5be","training_frame":"airlines_all.hex","ignored_columns":["Year","Month","DayofMonth","DayOfWeek","DepTime","ArrTime","TailNum","ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay","Distance","TaxiIn","TaxiOut","Cancelled","CancellationCode","Diverted","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay","LateAircraftDelay"],"dropNA20Cols":false,"response_column":"IsDepDelayed","solver":"ADMM","max_iter":50,"beta_eps":0,"standardize":true,"family":"binomial","n_folds":0,"balance_classes":false,"link":"family_default","tweedie_variance_power":"NaN","tweedie_link_power":"NaN","alpha":[0.5],"lambda":[0.00001],"lambda_search":false,"use_all_factor_levels":false,"class_sampling_factors":[],"max_after_balance_size":5,"prior1":0,"nlambdas":-1,"lambda_min_ratio":-1}

e) getModel "glm-076a86b8-0340-4cf5-8899-773d57e6c5be" f) predict model: "glm-076a86b8-0340-4cf5-8899-773d57e6c5be" g) predict model: "glm-076a86b8-0340-4cf5-8899-773d57e6c5be", frame: "airlines_all.hex", destination_key: "prediction-b77188f5-577f-40e4-88df-1be3068bc496" h) Build GBM Model buildModel 'gbm', {"destination_key":"gbm-eecbe7ba-675e-4cff-9865-6e3d5f7c32a6","training_frame":"airlines_all.hex","ignored_columns":["Year","Month","DayofMonth","DayOfWeek","DepTime","ArrTime","TailNum","ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay","Distance","TaxiIn","TaxiOut","Cancelled","CancellationCode","Diverted","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay","LateAircraftDelay"],"dropNA20Cols":false,"response_column":"IsDepDelayed","ntrees":50,"max_depth":5,"min_rows":10,"nbins":20,"learn_rate":0.1,"loss":"bernoulli","balance_classes":false,"class_sampling_factors":[],"max_after_balance_size":5,"seed":0}

i) getModel "gbm-eecbe7ba-675e-4cff-9865-6e3d5f7c32a6" j) predict model: "gbm-eecbe7ba-675e-4cff-9865-6e3d5f7c32a6" k) predict model: "gbm-eecbe7ba-675e-4cff-9865-6e3d5f7c32a6", frame: "airlines_all.hex", destination_key: "prediction-aa7a58a9-16e6-46ae-a041-fd342c324ed0" l) Build deep learning model buildModel 'deeplearning', {"destination_key":"deeplearning-4fe61e32-2077-42cb-8466-8d5f7be5366d","training_frame":"airlines_all.hex","ignored_columns":["Year","Month","DayofMonth","DayOfWeek","DepTime","ArrTime","TailNum","ActualElapsedTime","CRSElapsedTime","AirTime","ArrDelay","DepDelay","Distance","TaxiIn","TaxiOut","Cancelled","CancellationCode","Diverted","CarrierDelay","WeatherDelay","NASDelay","SecurityDelay","LateAircraftDelay"],"dropNA20Cols":false,"response_column":"IsDepDelayed","n_folds":0,"activation":"Rectifier","hidden":[50,50],"epochs":"1","variable_importances":false,"replicate_training_data":true,"balance_classes":false,"checkpoint":"","use_all_factor_levels":true,"train_samples_per_iteration":-2,"adaptive_rate":true,"rho":0.99,"epsilon":1e-8,"input_dropout_ratio":0,"hidden_dropout_ratios":[],"l1":0,"l2":0,"score_interval":5,"score_training_samples":10000,"score_validation_samples":0,"autoencoder":false,"class_sampling_factors":[],"max_after_balance_size":5,"keep_cross_validation_splits":false,"override_with_best_model":true,"target_ratio_comm_to_comp":0.02,"seed":3949238726006768600,"rate":0.005,"rate_annealing":0.000001,"rate_decay":1,"momentum_start":0,"momentum_ramp":1000000,"momentum_stable":0,"nesterov_accelerated_gradient":true,"max_w2":"Infinity","initial_weight_distribution":"UniformAdaptive","initial_weight_scale":1,"loss":"CrossEntropy","score_duty_cycle":0.1,"classification_stop":0,"regression_stop":0.000001,"max_hit_ratio_k":10,"score_validation_sampling":"Uniform","diagnostics":true,"fast_mode":true,"ignore_const_cols":true,"force_load_balance":true,"single_node_mode":false,"shuffle_training_data":false,"missing_values_handling":"MeanImputation","quiet_mode":false,"max_confusion_matrix_size":20,"sparse":false,"col_major":false,"average_activation":0,"sparsity_beta":0,"max_categorical_features":2147483647,"reproducible":false} m)getModel "deeplearning-4fe61e32-2077-42cb-8466-8d5f7be5366d" n)Predict predict model: "deeplearning-4fe61e32-2077-42cb-8466-8d5f7be5366d"

0)Frames drop down shows empty list p)getFrames gives error below

Error evaluating cell

Error calling GET /3/Frames.json with opts null

water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null

TOGGLE STACK TRACE water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (java.lang.RuntimeException) water.fvec.RollupStats.get(RollupStats.java:275) water.fvec.RollupStats.get(RollupStats.java:284) water.fvec.Vec.rollupStats(Vec.java:545) water.fvec.Vec.byteSize(Vec.java:514) water.fvec.Frame.byteSize(Frame.java:368) water.api.FrameV2.(FrameV2.java:195) water.api.FramesBase.fillFromImpl(FramesBase.java:60) water.api.FramesHandler.list(FramesHandler.java:100) sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) water.api.Handler.handle(Handler.java:57) water.api.RequestServer.handle(RequestServer.java:632) water.api.RequestServer.serve(RequestServer.java:590) water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:434) java.lang.Thread.run(Thread.java:745)

Please check attached logs/stack trace

exalate-issue-sync[bot] commented 1 year ago

Neeraja Madabhushi commented: D-UDP-Recv" prio=9 tid=27 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.DatagramChannelImpl.receive0(Native Method)
at sun.nio.ch.DatagramChannelImpl.receiveIntoNativeBuffer(DatagramChannelImpl.java:425)
at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:403)
at sun.nio.ch.DatagramChannelImpl.receive(DatagramChannelImpl.java:356)
at water.AutoBuffer.<init>(AutoBuffer.java:88)
at water.UDPReceiverThread.run(UDPReceiverThread.java:53)

"IPC Client (167378624) connection to /172.16.2.182:45494 from job_1427144101512_1044" daemon prio=5 tid=12 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:903)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:948)

"FJ-2-43" daemon prio=9 tid=15092 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"TCP-/172.16.2.188:54321-1" prio=9 tid=485 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"FJ-0-41" daemon prio=9 tid=18542 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"Finalizer" daemon prio=8 tid=3 java.lang.Thread.State: WAITING

at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"FJ-125-47" daemon prio=9 tid=18873 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"main" prio=5 tid=1 java.lang.Thread.State: RUNNABLE

at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at water.hadoop.h2omapper.run2(h2omapper.java:426)
at water.hadoop.h2omapper.run(h2omapper.java:457)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

"TCP-/172.16.2.189:54321-0" prio=9 tid=367 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"FJ-122-55" daemon prio=9 tid=18080 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"LeaseRenewer:neeraja@mr-0xd6.0xdata.loc" daemon prio=5 tid=24 java.lang.Thread.State: TIMED_WAITING

at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:438)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
at java.lang.Thread.run(Thread.java:745)

"IPC Parameter Sending Thread #1" daemon prio=5 tid=15977 java.lang.Thread.State: TIMED_WAITING

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

"TCP-Accept" prio=9 tid=33 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
at water.TCPReceiverThread.run(TCPReceiverThread.java:47)

"IPC Client (167378624) connection to mr-0xd6.0xdata.loc/172.16.2.186:8020 from neeraja" daemon prio=5 tid=18864 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:903)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:948)

"FJ-124-23" daemon prio=9 tid=18093 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"FJ-119-27" daemon prio=9 tid=18872 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"ACKTimeout" prio=9 tid=32 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:220)
at water.H2ONode$AckAckTimeOutThread.run(H2ONode.java:365)

"Reference Handler" daemon prio=10 tid=2 java.lang.Thread.State: WAITING

at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)

"MemCleaner" daemon prio=8 tid=29 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at water.Cleaner.block_store_cleaner(Cleaner.java:30)
at water.Cleaner.run(Cleaner.java:78)

"FJ-121-11" daemon prio=9 tid=18090 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"FJ-123-1" daemon prio=9 tid=18874 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"TCP-/172.16.2.189:54321-1" prio=9 tid=402 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"communication thread" daemon prio=5 tid=18 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:719)
at java.lang.Thread.run(Thread.java:745)

"Thread-11" daemon prio=5 tid=23 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:502)

"Signal Dispatcher" daemon prio=9 tid=4 java.lang.Thread.State: RUNNABLE

"Timer for 'MapTask' metrics system" daemon prio=5 tid=11 java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
at java.util.TimerThread.run(Timer.java:505)

"NanoHTTPD Thread" daemon prio=5 tid=38 java.lang.Thread.State: RUNNABLE

at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at water.NanoHTTPD$1.run(NanoHTTPD.java:210)
at java.lang.Thread.run(Thread.java:745)

"FJ-119-13" daemon prio=9 tid=18871 java.lang.Thread.State: RUNNABLE

at java.lang.Thread.dumpThreads(Native Method)
at java.lang.Thread.getAllStackTraces(Thread.java:1640)
at water.util.JStackCollectorTask.setupLocal(JStackCollectorTask.java:29)
at water.MRTask.setupLocal0(MRTask.java:339)
at water.MRTask.dinvoke(MRTask.java:282)
at water.RPC$RPCCall.compute2(RPC.java:333)
at water.H2O$H2OCountedCompleter.compute(H2O.java:631)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"Thread for syncLogs" daemon prio=5 tid=17 java.lang.Thread.State: TIMED_WAITING

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

"UDPTimeout" prio=5 tid=31 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:209)
at water.UDPTimeOutThread.run(UDPTimeOutThread.java:25)

"org.apache.hadoop.hdfs.PeerCache@45fae22b" daemon prio=5 tid=22 java.lang.Thread.State: TIMED_WAITING

at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:245)
at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:41)
at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:119)
at java.lang.Thread.run(Thread.java:745)

"TCP-/172.16.2.183:54321-1" prio=9 tid=487 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"FJ-123-37" daemon prio=9 tid=18088 java.lang.Thread.State: TIMED_WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:1626)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1579)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"TCP-/172.16.2.184:54321-0" prio=9 tid=486 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"TCP-/172.16.2.188:54321-0" prio=9 tid=357 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"FJ-125-27" daemon prio=9 tid=14080 java.lang.Thread.State: TIMED_WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.idleAwaitWork(ForkJoinPool.java:1626)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1579)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"Thread-13" prio=5 tid=25 java.lang.Thread.State: TIMED_WAITING

at java.lang.Thread.sleep(Native Method)
at water.hadoop.h2omapper$CounterThread.run(h2omapper.java:270)

"FJ-120-3" daemon prio=9 tid=18057 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"FJ-126-15" daemon prio=9 tid=30 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"FJ-1-153" daemon prio=4 tid=18060 java.lang.Thread.State: WAITING

at sun.misc.Unsafe.park(Native Method)
at jsr166y.ForkJoinPool.scan(ForkJoinPool.java:1594)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

"TCP-/172.16.2.183:54321-0" prio=9 tid=408 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)

"Heartbeat" daemon prio=10 tid=35 java.lang.Thread.State: TIMED_WAITING

at java.lang.Thread.sleep(Native Method)
at water.HeartBeatThread.run(HeartBeatThread.java:160)

"TCP-/172.16.2.184:54321-1" prio=9 tid=488 java.lang.Thread.State: RUNNABLE

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at water.AutoBuffer.getImpl(AutoBuffer.java:529)
at water.AutoBuffer.getSz(AutoBuffer.java:515)
at water.AutoBuffer.getPort(AutoBuffer.java:852)
at water.AutoBuffer.<init>(AutoBuffer.java:116)
at water.TCPReceiverThread$TCPReaderThread.run(TCPReceiverThread.java:102)
exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: Can you reproduce it? Cliff

exalate-issue-sync[bot] commented 1 year ago

Neeraja Madabhushi commented: Couldn't reproduce it.

Could be related to https://0xdata.atlassian.net/browse/PUBDEV-604

Where additional frame was still around and something got corrupted.

While reproducing verified that no unnecessary frames left out as in PUBDEV-604

exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: Hoping some other fix got this (e.g. recent Job-key update or recent compressed-chunk fix)

Re-open if we see this stack trace again

exalate-issue-sync[bot] commented 1 year ago

Neeraja Madabhushi commented: I am seeing Rollup stats exception consistently when I try to do getFrames on my cluster on 172.16.2.189:55555.

Steps to reproduce :

1) go to 172.16.2.189:55555 2) getFrames

Error evaluating cell

Error calling GET /3/Frames with opts null

water.DException$DistributedException: from /172.16.2.188:55555; by class water.fvec.RollupStats$ComputeRollupsTask; class water.DException$DistributedException: from /172.16.2.189:55555; by class water.fvec.RollupStats$Roll; class java.lang.NullPointerException: null

TOGGLE STACK TRACE water.DException$DistributedException: from /172.16.2.188:55555; by class water.fvec.RollupStats$ComputeRollupsTask; class water.DException$DistributedException: from /172.16.2.189:55555; by class water.fvec.RollupStats$Roll; class java.lang.NullPointerException: null (java.lang.RuntimeException) water.fvec.RollupStats.get(RollupStats.java:278) water.fvec.RollupStats.get(RollupStats.java:287) water.fvec.Vec.rollupStats(Vec.java:565) water.fvec.Vec.checksum_impl(Vec.java:584) water.Keyed.checksum(Keyed.java:52) water.fvec.Frame.checksum_impl(Frame.java:382) water.Keyed.checksum(Keyed.java:52) water.api.FrameV3.(FrameV3.java:223) water.api.FramesBase.fillFromImpl(FramesBase.java:66) water.api.FramesHandler.list(FramesHandler.java:88) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) water.api.Handler.handle(Handler.java:57) water.api.RequestServer.handle(RequestServer.java:668) water.api.RequestServer.serve(RequestServer.java:604) water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:434) java.lang.Thread.run(Thread.java:745)

exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: Seen it with python demo; a little python/rapids munging and a broken Frame gets leaked. Asking for all frames returns the broken frame & crashes in Rollups Cliff

exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: python quantiles call leaves a broken frame in the cluster. Any "get all frames" call gets the broken frame and rollups die

exalate-issue-sync[bot] commented 1 year ago

Cliff Click commented: Believe fixed with recent changes. Please repro & reopen as needed. Thanks Cliff

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-603 Assignee: Cliff Click Reporter: Neeraja Madabhushi State: Resolved Fix Version: N/A Attachments: Available (Count: 4) Development PRs: N/A

Attachments From Jira

Attachment Name: h2ologs_20150324_033521.zip Attached By: Neeraja Madabhushi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-603/h2ologs_20150324_033521.zip

Attachment Name: h2ologs_20150423_121210.zip Attached By: Neeraja Madabhushi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-603/h2ologs_20150423_121210.zip

Attachment Name: Screen Shot 2015-03-24 at 3.08.17 PM.png Attached By: Neeraja Madabhushi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-603/Screen Shot 2015-03-24 at 3.08.17 PM.png

Attachment Name: Screen Shot 2015-03-24 at 3.09.01 PM.png Attached By: Neeraja Madabhushi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-603/Screen Shot 2015-03-24 at 3.09.01 PM.png