radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Misleading RAPTOR message pyfuncs and suppressed error message. #3011

Open AymenFJA opened 1 year ago

AymenFJA commented 1 year ago

The RAPTOR code, specifically in https://github.com/radical-cybertools/radical.pilot/blob/9190fa38783191167156365ac2e7bcd508ab2c15/src/radical/pilot/raptor/worker.py#L358

If this line fails, our RAPTOR logger will print: https://github.com/radical-cybertools/radical.pilot/blob/9190fa38783191167156365ac2e7bcd508ab2c15/src/radical/pilot/raptor/worker.py#L361

Two problems here:

  1. The message of the get_func_attr is suppressed, and we just report a hard-coded message (we need to at least report the correct message, not the entire exception)
  2. I tested it with a Python function with some module dependencies that were unavailable in the namespace where the function was deserialized, and I printed the exception. Something like this:
    except Exception as e:
          self._log.warn('function is not a PythonTask [%s] ', uid)
          self._log.warn(e)

This is the modified output message from our raptor.worker:

1691803277.803 : master.000000.worker.0000 : 28773 : 140004894422784 : WARNING  : function is not a PythonTask [task.000000]
1691803277.803 : master.000000.worker.0000 : 28773 : 140004894422784 : WARNING  : No module named 'mpi_funcs'
1691803277.804 : master.000000.worker.0000 : 28777 : 139872066201344 : DEBUG    : orig args: [] : {}

As you can see, the first line is what RAPTOR reported and marked this function as not PythonTask which is not correct as this task was serialized by our function serializer.

The 2nd line shows the actual error of why our serialization failed, and that is because No module named 'mpi_funcs' which my function requires during the execution.