Unexpected console message from IVY and spark

mbalduini commented 3 years ago

Description

used docker image: jupyter/pyspark-notebook:b9f6ce795cfc

Started up via docker-compose:

version: '3'
services:
qc-platform:
image: jupyter/pyspark-notebook:b9f6ce795cfc
ports:
  - 8888:8888
environment:
  - GRANT_SUDO=yes 
  - JUPYTER_ENABLE_LAB=yes
  - JUPYTER_TOKEN=test
user: root
restart: unless-stopped

Additional Information: Downgraded from java 11 to Java 8

Running a simple cell to create a spark session (see code below) annoying message from Ivy (related to packages config) and spark start to be printed on console.

Screenshot 2021-08-09 at 13 03 56

I tried to change che log4j configuration for spark and the logging lib to set the global logging level with no results.

Any help in removing completely the messages in the picture?

mathbunnyru commented 3 years ago

I think the issue is that now, when something is printed to stderr, jupyter shows it in a red box.

What you can do is:

import os
import sys
f = open(os.devnull, 'w')
sys.stderr = f

Note, that this sets stderr to /dev/null/, so, if you want to see it later, you need to save and restore previous value.

mbalduini commented 3 years ago

Thank you for suggestion @mathbunnyru, but the proposed solution doesn't seem to work. No change in the output.

Any chance to specify redirection only for specific source?

mathbunnyru commented 3 years ago

Could you please make your question reproducible by other people? No one wants to copy-paste the code from the screenshot. Also, please tell us why you downgraded java and how you did it.

mbalduini commented 3 years ago

Got it, you are right.

I downgraded to java 8 in order to avoid additional warning (find below):

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

I downgraded by installing the open jdk version sudo apt-get install openjdk-8-jre and then selecting the version via sudo update-alternatives --config java command
Here the code I used to create the spark session with additional packages:
```
from pyspark.sql import SparkSession
```

spark_jars = "org.apache.hadoop:hadoop-aws:3.2.0,org.postgresql:postgresql:42.2.18,org.apache.spark:spark-avro_2.12:3.0.1,org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.kafka:kafka-clients:2.6.0,com.databricks:spark-xml_2.12:0.12.0"

spark = (SparkSession.builder .master("local[*]") .appName("test-edu") .config("spark.jars.packages", spark_jars) .getOrCreate() )

spark


* Here below the ivy print:

Ivy Default Cache set to: /home/jovyan/.ivy2/cache The jars for the packages stored in: /home/jovyan/.ivy2/jars org.apache.hadoop#hadoop-aws added as a dependency org.postgresql#postgresql added as a dependency org.apache.spark#spark-avro_2.12 added as a dependency org.apache.spark#spark-streaming-kafka-0-10_2.11 added as a dependency org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency org.apache.kafka#kafka-clients added as a dependency com.databricks#spark-xml_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4;1.0 confs: [default] found org.apache.hadoop#hadoop-aws;3.2.0 in central found com.amazonaws#aws-java-sdk-bundle;1.11.375 in central found org.postgresql#postgresql;42.2.18 in central found org.checkerframework#checker-qual;3.5.0 in central found org.apache.spark#spark-avro_2.12;3.0.1 in central found org.spark-project.spark#unused;1.0.0 in central found org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 in central found org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 in central found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 in central found org.apache.commons#commons-pool2;2.6.2 in central found org.apache.kafka#kafka-clients;2.6.0 in central found com.github.luben#zstd-jni;1.4.4-7 in central found org.lz4#lz4-java;1.7.1 in central found org.xerial.snappy#snappy-java;1.1.7.3 in central found org.slf4j#slf4j-api;1.7.30 in central found com.databricks#spark-xml_2.12;0.12.0 in central found commons-io#commons-io;2.8.0 in central found org.glassfish.jaxb#txw2;2.3.3 in central found org.apache.ws.xmlschema#xmlschema-core;2.2.5 in central :: resolution report :: resolve 491ms :: artifacts dl 13ms :: modules in use: com.amazonaws#aws-java-sdk-bundle;1.11.375 from central in [default] com.databricks#spark-xml_2.12;0.12.0 from central in [default] com.github.luben#zstd-jni;1.4.4-7 from central in [default] commons-io#commons-io;2.8.0 from central in [default] org.apache.commons#commons-pool2;2.6.2 from central in [default] org.apache.hadoop#hadoop-aws;3.2.0 from central in [default] org.apache.kafka#kafka-clients;2.6.0 from central in [default] org.apache.spark#spark-avro_2.12;3.0.1 from central in [default] org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 from central in [default] org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 from central in [default] org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 from central in [default] org.apache.ws.xmlschema#xmlschema-core;2.2.5 from central in [default] org.checkerframework#checker-qual;3.5.0 from central in [default] org.glassfish.jaxb#txw2;2.3.3 from central in [default] org.lz4#lz4-java;1.7.1 from central in [default] org.postgresql#postgresql;42.2.18 from central in [default] org.slf4j#slf4j-api;1.7.30 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] org.xerial.snappy#snappy-java;1.1.7.3 from central in [default] :: evicted modules: org.apache.kafka#kafka-clients;2.0.0 by [org.apache.kafka#kafka-clients;2.6.0] in [default] org.apache.kafka#kafka-clients;2.4.1 by [org.apache.kafka#kafka-clients;2.6.0] in [default]

|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |   21  |   0   |   0   |   2   ||   19  |   0   |
---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4 confs: [default] 0 artifacts copied, 19 already retrieved (0kB/26ms) 21/08/09 12:19:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).



The console outputs related to Ivy operations appear with both the java version

mbalduini commented 3 years ago

Any update on this issue?

I tested the code with the most recent version and the behaviour persists

mathbunnyru commented 3 years ago

@mbalduini I've tried several solutions, but I didn't find something working, I think you need to somehow configure pyspark.sql logger (I haven't used pyspark, that's why I can't help you).

mathbunnyru commented 3 years ago

Another option would be to somehow configure jupyterlab / jupyterlab cell to not to show stderr (or sth like this). I don't know if it's easily possible.

mbalduini commented 2 years ago

Hi @mathbunnyru thank you for your effort. Unfortunately I tried several option too but no success yet, even with the latest release.

Do you have any further information or suggestions to cope with this problem?

romainx commented 2 years ago

Hello @mbalduini and @mathbunnyru,

I have looked into this problem in more depth. The modification of the notebook output comes from one of the changes made in the release 6.0.0 of ipykernel.

All outputs to stdout/stderr should now be captured, including subprocesses and output of compiled libraries (blas, lapack....). In notebook server, some outputs that would previously go to the notebooks logs will now both head to notebook logs and in notebooks outputs.

A subsequent fix provides a way to restore the previous behavior. The fix consists in disabling the capture new behavior by turning it off through the capture_fd_output flag, see the following comment for more detail -> https://github.com/ipython/ipykernel/issues/795#issuecomment-960642330.

You can configure it by turning it off in your ipython profile.

# create a default profile
ipython profile create

Edit the file ~/.ipython/profile_default/ipython_kernel_config.py and add the following line.

c.IPKernelApp.capture_fd_output = False

That's it ! All the outputs from Java, Spark and Ivy will no more be displayed in the notebook but only in the logs. We have to check if we could / should do something here to provide this configuration by default. @mathbunnyru what is your opinion?

mathbunnyru commented 2 years ago

@romainx nice!

I think we can try to add this file to pyspark image (it will also be included in all-spark). I think these logs are noisy and everyone using spark sees them and they don't make a lot of sense, if everything goes right.

romainx commented 2 years ago

@mathbunnyru 👍 And in fact they still appear in the container logs even after this change. I will draft a PR for that (this will be the opportunity to start my contributions here again 😄).

jupyter / docker-stacks

Unexpected console message from IVY and spark #1423