Open VickyTheViking opened 6 months ago
Hey @VickyTheViking, thanks for your contribution!
I'm reviewing your plugin but I've found that it's not working properly. Specifically, it looks like the fingerprinting for the main 8080
port is working, but not for the other two ports. I tried to do some quick troubleshooting, but I'm not familiar with Spark.
Here are the issues to fix:
update.sh
should be in be in the spark/
folder, not spark/app
update.sh
should have the executable bit setspark-worker
container fails to start. Docker logs show the issue ERROR Utils: Failed to create directory /opt/spark/work
which appears to be a permission issue. I was able to bypass it by adding user: root
in the docker compose file, but I don't know if this is the correct way to fix such an issue. This seems to fix the issue with port 8081
.docker exec
command fails, so the Python script is never executed and port 4040
remains unreachable.Feel free to reach out.
~ Savio (Doyensec)
Hi @lokiuox thank you for review.
I fixed some items you have told. but for the permission error you mentioned I searched for the best way to fix, the best way is what you did before setting user: root
and because we only want to run spark to getting fingerprints then that's not have any security issue. the example I provided runs the spark core and because of infinite loop waits until close so in this time master and worker dashboard are visible to use via port 8080 and 8081 which indicates to master and worker UI and port 4040 which indicates to the dashboard UI.
Spark image does not have python but it has some java files which they can run python. for example I can run Fibonacci example with this command:
docker exec -d spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py
In this example we run the fib.py example with spark-submit.
Hey @VickyTheViking, thanks for the update, you still have to address the following issues:
user: root
in the docker compose file to fix the spark-worker
containerdocker exec
command failingThis is what I get when I try to manually reproduce the workflow and I launch the docker exec command:
$ docker exec -it spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
24/07/08 17:51:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(Unknown Source)
at java.base/java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(Unknown Source)
at java.base/java.lang.ProcessImpl.start(Unknown Source)
... 16 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Hi, dear tsunami team. Apache Spark has different Web UI based on the way which it is ran. After hours of search I found a way to run it in the way that I can access all web UIs. So in this pull request we have:
1- Master web UI 2- Worker web UI 3- Web interface (Runs only when a SparkContext is running)
Which are extracted in one run of each version . I used apache/spark as base docker image, because it covered more versions than the official _/spark docker repo which naturally do not differ from each other. and versions without docker image were ignored.