qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark
http://sparklens.qubole.com
Apache License 2.0
568 stars 138 forks source link

Executor IDs not found when analyzing spark history log #27

Open alexlipa91 opened 5 years ago

alexlipa91 commented 5 years ago

I am running sparklens with a spark history file passed as parameter and I get a bunch of java.util.NoSuchElementException: key not found from com.qubole.sparklens.QuboleJobListener.onExecutorRemoved Note that after several exceptions, the analysis goes fine.

If I understood correctly, this code is triggered when an executor is removed and look for the executor data in the executorMap, implying that this executor id is present in the executor map, which makes sense. Do you have any idea why this happens? I see in my stacktrace this happening for 4-5 executor ids. Does it make sense to access the map with a get and catch this exception to not pollute the output of the analysis?

I am running on spark 2.3 with speculation and dynamic allocation enabled (not sure it the latter can be related to the problem)

The full stacktrace is here https://www.pastiebin.com/5bef1290999d2

UPDATE: I see other exception raising that look like the following

Exception in thread "pool-4-thread-3" java.util.NoSuchElementException: key not found: executorRuntime

for different thread names

iamrohit commented 5 years ago

Hi @alexlipa91! Thanks for reporting this issue.

For the executor removed issue I am suspecting that the spark installation that you are using to process the event log file is different from the one which was used to create it. Key "158" doesn't looks like a valid executor id. Different distributions sometimes change the data stored in the executor events. I can confirm this by looking at the relevant JSON snippet from the event log file.

The issue with missing key "runtime" is likely happing because we have stage with 0 tasks. This is a bug. We will fix this.

Thanks!