jupyter / docker-stacks

Ready-to-run Docker images containing Jupyter applications
https://jupyter-docker-stacks.readthedocs.io
Other
7.95k stars 3k forks source link

Unexpected console message from IVY and spark #1423

Closed mbalduini closed 2 years ago

mbalduini commented 3 years ago

Description

Running a simple cell to create a spark session (see code below) annoying message from Ivy (related to packages config) and spark start to be printed on console.

Screenshot 2021-08-09 at 13 03 56

I tried to change che log4j configuration for spark and the logging lib to set the global logging level with no results.

Any help in removing completely the messages in the picture?

mathbunnyru commented 3 years ago

I think the issue is that now, when something is printed to stderr, jupyter shows it in a red box.

What you can do is:

import os
import sys
f = open(os.devnull, 'w')
sys.stderr = f

Note, that this sets stderr to /dev/null/, so, if you want to see it later, you need to save and restore previous value.

mbalduini commented 3 years ago

Thank you for suggestion @mathbunnyru, but the proposed solution doesn't seem to work. No change in the output.

Any chance to specify redirection only for specific source?

mathbunnyru commented 3 years ago

Could you please make your question reproducible by other people? No one wants to copy-paste the code from the screenshot. Also, please tell us why you downgraded java and how you did it.

mbalduini commented 3 years ago

Got it, you are right.

spark_jars = "org.apache.hadoop:hadoop-aws:3.2.0,org.postgresql:postgresql:42.2.18,org.apache.spark:spark-avro_2.12:3.0.1,org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.kafka:kafka-clients:2.6.0,com.databricks:spark-xml_2.12:0.12.0"

spark = (SparkSession.builder .master("local[*]") .appName("test-edu") .config("spark.jars.packages", spark_jars) .getOrCreate() )

spark


* Here below the ivy print:

Ivy Default Cache set to: /home/jovyan/.ivy2/cache The jars for the packages stored in: /home/jovyan/.ivy2/jars org.apache.hadoop#hadoop-aws added as a dependency org.postgresql#postgresql added as a dependency org.apache.spark#spark-avro_2.12 added as a dependency org.apache.spark#spark-streaming-kafka-0-10_2.11 added as a dependency org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency org.apache.kafka#kafka-clients added as a dependency com.databricks#spark-xml_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4;1.0 confs: [default] found org.apache.hadoop#hadoop-aws;3.2.0 in central found com.amazonaws#aws-java-sdk-bundle;1.11.375 in central found org.postgresql#postgresql;42.2.18 in central found org.checkerframework#checker-qual;3.5.0 in central found org.apache.spark#spark-avro_2.12;3.0.1 in central found org.spark-project.spark#unused;1.0.0 in central found org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 in central found org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 in central found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 in central found org.apache.commons#commons-pool2;2.6.2 in central found org.apache.kafka#kafka-clients;2.6.0 in central found com.github.luben#zstd-jni;1.4.4-7 in central found org.lz4#lz4-java;1.7.1 in central found org.xerial.snappy#snappy-java;1.1.7.3 in central found org.slf4j#slf4j-api;1.7.30 in central found com.databricks#spark-xml_2.12;0.12.0 in central found commons-io#commons-io;2.8.0 in central found org.glassfish.jaxb#txw2;2.3.3 in central found org.apache.ws.xmlschema#xmlschema-core;2.2.5 in central :: resolution report :: resolve 491ms :: artifacts dl 13ms :: modules in use: com.amazonaws#aws-java-sdk-bundle;1.11.375 from central in [default] com.databricks#spark-xml_2.12;0.12.0 from central in [default] com.github.luben#zstd-jni;1.4.4-7 from central in [default] commons-io#commons-io;2.8.0 from central in [default] org.apache.commons#commons-pool2;2.6.2 from central in [default] org.apache.hadoop#hadoop-aws;3.2.0 from central in [default] org.apache.kafka#kafka-clients;2.6.0 from central in [default] org.apache.spark#spark-avro_2.12;3.0.1 from central in [default] org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 from central in [default] org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 from central in [default] org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 from central in [default] org.apache.ws.xmlschema#xmlschema-core;2.2.5 from central in [default] org.checkerframework#checker-qual;3.5.0 from central in [default] org.glassfish.jaxb#txw2;2.3.3 from central in [default] org.lz4#lz4-java;1.7.1 from central in [default] org.postgresql#postgresql;42.2.18 from central in [default] org.slf4j#slf4j-api;1.7.30 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] org.xerial.snappy#snappy-java;1.1.7.3 from central in [default] :: evicted modules: org.apache.kafka#kafka-clients;2.0.0 by [org.apache.kafka#kafka-clients;2.6.0] in [default] org.apache.kafka#kafka-clients;2.4.1 by [org.apache.kafka#kafka-clients;2.6.0] in [default]

|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |   21  |   0   |   0   |   2   ||   19  |   0   |
---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4 confs: [default] 0 artifacts copied, 19 already retrieved (0kB/26ms) 21/08/09 12:19:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).



The console outputs related to Ivy operations appear with both the java version
mbalduini commented 3 years ago

Any update on this issue?

I tested the code with the most recent version and the behaviour persists

mathbunnyru commented 3 years ago

@mbalduini I've tried several solutions, but I didn't find something working, I think you need to somehow configure pyspark.sql logger (I haven't used pyspark, that's why I can't help you).

mathbunnyru commented 3 years ago

Another option would be to somehow configure jupyterlab / jupyterlab cell to not to show stderr (or sth like this). I don't know if it's easily possible.

mbalduini commented 2 years ago

Hi @mathbunnyru thank you for your effort. Unfortunately I tried several option too but no success yet, even with the latest release.

Do you have any further information or suggestions to cope with this problem?

romainx commented 2 years ago

Hello @mbalduini and @mathbunnyru,

I have looked into this problem in more depth. The modification of the notebook output comes from one of the changes made in the release 6.0.0 of ipykernel.

All outputs to stdout/stderr should now be captured, including subprocesses and output of compiled libraries (blas, lapack....). In notebook server, some outputs that would previously go to the notebooks logs will now both head to notebook logs and in notebooks outputs.

A subsequent fix provides a way to restore the previous behavior. The fix consists in disabling the capture new behavior by turning it off through the capture_fd_output flag, see the following comment for more detail -> https://github.com/ipython/ipykernel/issues/795#issuecomment-960642330.

You can configure it by turning it off in your ipython profile.

# create a default profile
ipython profile create

Edit the file ~/.ipython/profile_default/ipython_kernel_config.py and add the following line.

c.IPKernelApp.capture_fd_output = False

That's it ! All the outputs from Java, Spark and Ivy will no more be displayed in the notebook but only in the logs. We have to check if we could / should do something here to provide this configuration by default. @mathbunnyru what is your opinion?

mathbunnyru commented 2 years ago

@romainx nice!

I think we can try to add this file to pyspark image (it will also be included in all-spark). I think these logs are noisy and everyone using spark sees them and they don't make a lot of sense, if everything goes right.

romainx commented 2 years ago

@mathbunnyru 👍 And in fact they still appear in the container logs even after this change. I will draft a PR for that (this will be the opportunity to start my contributions here again 😄).