stephanecollot / sparkmon

Spark Monitoring
MIT License
12 stars 4 forks source link

Error after launching the monitoring (pandas lib) #234

Open borel-git opened 1 year ago

borel-git commented 1 year ago

Hello, I would like to use this tool to monitor my application who use Spark. Sometime it works, but often I have this error:

  app <sparkmon.application.Application object at 0x000001B31A49F1C0>
[]
mon <SparkMon(Thread-1, initial)>
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\indexes\base.py", line 3652, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 2606, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 2630, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\sparkmon\monitor.py", line 129, in run
    self.application.log_all()
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\sparkmon\application.py", line 231, in log_all
    self.log_executors_info()
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\sparkmon\application.py", line 85, in log_executors_info
    self.timeseries_db[now] = self.parse_executors(executors_df)
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\sparkmon\application.py", line 122, in parse_executors
    peakMemoryMetrics_df = pd.DataFrame.from_records(peakMemoryMetrics, index=peakMemoryMetrics.index)
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\frame.py", line 2266, in from_records
    arrays, arr_columns = to_arrays(data, columns)
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\internals\construction.py", line 829, in to_arrays
    if isinstance(data[0], (list, tuple)):
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\series.py", line 1007, in __getitem__
    return self._get_value(key)
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\series.py", line 1116, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\mborel\AppData\Roaming\Python\Python310\site-packages\pandas\core\indexes\base.py", line 3654, in get_loc
    raise KeyError(key) from err
KeyError: 0
Traceback (most recent call last):
  File "C:\Projects\PURE\pure-perf\scripts\sparkmonitoring", line 18, in <module>
    time.sleep(1240)
KeyboardInterrupt

Thanks in advance for your help, to be honest I don't understand why sometime I have results sparkmon.png etc but often this error. Rgds Mickael

cmditch commented 7 months ago

Also experiencing this error. I noticed it happened once I turned this spark config on to get python stats "spark.executor.processTreeMetrics.enabled": "true"