Closed mbalduini closed 2 years ago
I think the issue is that now, when something is printed to stderr, jupyter shows it in a red box.
What you can do is:
import os
import sys
f = open(os.devnull, 'w')
sys.stderr = f
Note, that this sets stderr to /dev/null/
, so, if you want to see it later, you need to save and restore previous value.
Thank you for suggestion @mathbunnyru, but the proposed solution doesn't seem to work. No change in the output.
Any chance to specify redirection only for specific source?
Could you please make your question reproducible by other people? No one wants to copy-paste the code from the screenshot. Also, please tell us why you downgraded java and how you did it.
Got it, you are right.
I downgraded to java 8 in order to avoid additional warning (find below):
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
I downgraded by installing the open jdk version sudo apt-get install openjdk-8-jre
and then selecting the version via sudo update-alternatives --config java
command
Here the code I used to create the spark session with additional packages:
from pyspark.sql import SparkSession
spark_jars = "org.apache.hadoop:hadoop-aws:3.2.0,org.postgresql:postgresql:42.2.18,org.apache.spark:spark-avro_2.12:3.0.1,org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.kafka:kafka-clients:2.6.0,com.databricks:spark-xml_2.12:0.12.0"
spark = (SparkSession.builder .master("local[*]") .appName("test-edu") .config("spark.jars.packages", spark_jars) .getOrCreate() )
spark
* Here below the ivy print:
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 21 | 0 | 0 | 2 || 19 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4 confs: [default] 0 artifacts copied, 19 already retrieved (0kB/26ms) 21/08/09 12:19:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
The console outputs related to Ivy operations appear with both the java version
Any update on this issue?
I tested the code with the most recent version and the behaviour persists
@mbalduini I've tried several solutions, but I didn't find something working, I think you need to somehow configure pyspark.sql
logger (I haven't used pyspark, that's why I can't help you).
Another option would be to somehow configure jupyterlab / jupyterlab cell to not to show stderr (or sth like this). I don't know if it's easily possible.
Hi @mathbunnyru thank you for your effort. Unfortunately I tried several option too but no success yet, even with the latest release.
Do you have any further information or suggestions to cope with this problem?
Hello @mbalduini and @mathbunnyru,
I have looked into this problem in more depth. The modification of the notebook output comes from one of the changes made in the release 6.0.0
of ipykernel.
All outputs to stdout/stderr should now be captured, including subprocesses and output of compiled libraries (blas, lapack....). In notebook server, some outputs that would previously go to the notebooks logs will now both head to notebook logs and in notebooks outputs.
A subsequent fix provides a way to restore the previous behavior. The fix consists in disabling the capture new behavior by turning it off through the capture_fd_output
flag, see the following comment for more detail -> https://github.com/ipython/ipykernel/issues/795#issuecomment-960642330.
You can configure it by turning it off in your ipython profile.
# create a default profile
ipython profile create
Edit the file ~/.ipython/profile_default/ipython_kernel_config.py
and add the following line.
c.IPKernelApp.capture_fd_output = False
That's it ! All the outputs from Java, Spark and Ivy will no more be displayed in the notebook but only in the logs. We have to check if we could / should do something here to provide this configuration by default. @mathbunnyru what is your opinion?
@romainx nice!
I think we can try to add this file to pyspark image (it will also be included in all-spark). I think these logs are noisy and everyone using spark sees them and they don't make a lot of sense, if everything goes right.
@mathbunnyru 👍 And in fact they still appear in the container logs even after this change. I will draft a PR for that (this will be the opportunity to start my contributions here again 😄).
Description
Running a simple cell to create a spark session (see code below) annoying message from Ivy (related to packages config) and spark start to be printed on console.
I tried to change che log4j configuration for spark and the
logging
lib to set the global logging level with no results.Any help in removing completely the messages in the picture?