h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
967 stars 359 forks source link

Two users getting the same H2O Notebook #5728

Open arunaryasomayajula opened 6 months ago

arunaryasomayajula commented 6 months ago
  1. SW cluster 1 started (multiple nodes) by user 1 with flow UI service on a certain port (for example, 000.000.000.001::54321).
  2. For some reason (could be timeout, oom etc.), the SW cluster 1 was dead and 000.000.000.001::54321 was released.
  3. In Spectrum Conductor, the status of the cluster 1 is still "started" with the flow UI link (000.000.000.001::54321).
  4. SW cluster 2 started by user 2 and it took 000.000.000.001::54321 and assigned flow UI service to this port.
  5. Now user 1 and user 2 will see the same cluster from Spectrum Conductor with flow UI service on 000.000.000.001::54321.

Sparkling Water Context:

I suspect Flow UI crashed for some reason and port 54323 is released at Feb/20 05:02:30.

H2OContext has been closed! Please create a new H2OContext to a healthy and reachable (web enabled) H2O cluster. at ai.h2o.sparkling.H2OContext$$anon$1.run(H2OContext.scala:359) Caused by: ai.h2o.sparkling.backend.exceptions.RestApiNotReachableException: H2O node https://10.119.198.87:54323 is not reachable.

AIMD H2O notebook starts at Feb/21 08:11:31, UI Flow binds to freed port 54323. Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:

Please also provide us with the full and minimal reproducible code.