lifeomic / sparkflow

Easy to use library to bring Tensorflow on Apache Spark
MIT License
298 stars 46 forks source link

Running in Zeppelin results in non-progressing execution #32

Open PowerToThePeople111 opened 5 years ago

PowerToThePeople111 commented 5 years ago

Hey guys,

just a minor thing. Since I run most of my analysis using a Zeppelin frontend, I also wanted to use it when training models with SparkFlow. Sadly tho, the training process by the famous MNIST example does not run through: it starts well off (as it would in a shell) and at some points just hangs (producing no output or error).


W0710 16:11:49.977241 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:20: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0710 16:11:49.977596 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:21: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.

W0710 16:11:49.977807 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:22: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

W0710 16:11:49.978009 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:23: The name tf.train.AdadeltaOptimizer is deprecated. Please use tf.compat.v1.train.AdadeltaOptimizer instead.

W0710 16:11:50.300260 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:137: The name tf.MetaGraphDef is deprecated. Please use tf.compat.v1.MetaGraphDef instead.

WARNING: Logging before flag parsing goes to stderr.
W0710 16:11:52.099525 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:188: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead.

2019-07-10 16:11:52.100034: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-07-10 16:11:52.119754: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300100000 Hz
2019-07-10 16:11:52.120280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561152a547a0 executing computations on platform Host. Devices:
2019-07-10 16:11:52.120302: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
E0710 16:11:52.121669655   16576 socket_utils_common_posix.cc:198] check for SO_REUSEPORT: {"created":"@1562775112.121656125","description":"SO_REUSEPORT unavailable on compiling system","file":"external/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":166}
2019-07-10 16:11:52.121849: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job local -> {0 -> localhost:42003}
2019-07-10 16:11:52.123072: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:42003
W0710 16:11:52.125344 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:191: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.

W0710 16:11:52.203922 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:192: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

W0710 16:11:52.204156 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:192: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

W0710 16:11:52.204305 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:193: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W0710 16:11:52.350754 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:197: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

W0710 16:11:52.351631 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:199: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

When I execute the same code in the Pyspark shell, I also get the following lines and also the output of the training process itself.

2019-07-10 16:11:52.390889: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
 * Serving Flask app "sparkflow.HogwildSparkModel" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

You got any ideas why that happens? I explicitly allowed multiple contexts by setting spark.driver.allowMultipleContexts to true in the interpreter settings of spark/pyspark.