Updater script, asset files and also spark.binproto file added.

VickyTheViking commented 6 months ago

Hi, dear tsunami team. Apache Spark has different Web UI based on the way which it is ran. After hours of search I found a way to run it in the way that I can access all web UIs. So in this pull request we have:

1- Master web UI 2- Worker web UI 3- Web interface (Runs only when a SparkContext is running)

Which are extracted in one run of each version . I used apache/spark as base docker image, because it covered more versions than the official _/spark docker repo which naturally do not differ from each other. and versions without docker image were ignored.

lokiuox commented 4 months ago

Hey @VickyTheViking, thanks for your contribution!

I'm reviewing your plugin but I've found that it's not working properly. Specifically, it looks like the fingerprinting for the main 8080 port is working, but not for the other two ports. I tried to do some quick troubleshooting, but I'm not familiar with Spark.

Here are the issues to fix:

update.sh should be in be in the spark/ folder, not spark/app
update.sh should have the executable bit set
The spark-worker container fails to start. Docker logs show the issue ERROR Utils: Failed to create directory /opt/spark/work which appears to be a permission issue. I was able to bypass it by adding user: root in the docker compose file, but I don't know if this is the correct way to fix such an issue. This seems to fix the issue with port 8081.
Nonetheless, the docker exec command fails, so the Python script is never executed and port 4040 remains unreachable.
Also the selected Spark docker images do not seem to have Python at all, ensure they are the correct images.

Feel free to reach out.

~ Savio (Doyensec)

VickyTheViking commented 3 months ago

Hi @lokiuox thank you for review.

I fixed some items you have told. but for the permission error you mentioned I searched for the best way to fix, the best way is what you did before setting user: root and because we only want to run spark to getting fingerprints then that's not have any security issue. the example I provided runs the spark core and because of infinite loop waits until close so in this time master and worker dashboard are visible to use via port 8080 and 8081 which indicates to master and worker UI and port 4040 which indicates to the dashboard UI.

Spark image does not have python but it has some java files which they can run python. for example I can run Fibonacci example with this command:

docker exec -d spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py

In this example we run the fib.py example with spark-submit.

lokiuox commented 2 months ago

Hey @VickyTheViking, thanks for the update, you still have to address the following issues:

Set user: root in the docker compose file to fix the spark-worker container
Fix the issue with the docker exec command failing

This is what I get when I try to manually reproduce the workflow and I launch the docker exec command:

$ docker exec -it spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
24/07/08 17:51:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
    at java.base/java.lang.ProcessBuilder.start(Unknown Source)
    at java.base/java.lang.ProcessBuilder.start(Unknown Source)
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.base/java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
    at java.base/java.lang.ProcessImpl.<init>(Unknown Source)
    at java.base/java.lang.ProcessImpl.start(Unknown Source)
    ... 16 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

google / tsunami-security-scanner-plugins

Updater script, asset files and also spark.binproto file added. #448