radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Feature/mpi raptor tutorial #3044

Closed AymenFJA closed 11 months ago

AymenFJA commented 1 year ago

Adding RAPTOR MPI tutorial to raptor.ipynb (related to #3040 )

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

mturilli commented 12 months ago

@AymenFJA , did we agreed to create a separate Notebook so to separate a possible failure of the non-MPI RAPTOR worker from those of the MPI worker? I remember we discussed that but I may be wrong about the agreed upon conclusion.

codecov[bot] commented 12 months ago

Codecov Report

Merging #3044 (5440bf1) into devel (b42fcd1) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##            devel    #3044   +/-   ##
=======================================
  Coverage   43.90%   43.90%           
=======================================
  Files          96       96           
  Lines       10563    10563           
=======================================
  Hits         4638     4638           
  Misses       5925     5925           

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

AymenFJA commented 12 months ago

@AymenFJA , did we agreed to create a separate Notebook so to separate a possible failure of the non-MPI RAPTOR worker from those of the MPI worker? I remember we discussed that but I may be wrong about the agreed upon conclusion. @mturilli Ok, there might be some misunderstanding. Happy to create a separate notebook with the tutorial of RAPTOR MPI only.

mturilli commented 12 months ago

@AymenFJA thanks. Pls check why multipilot is failing, checking the sessions of the notebook by downloading them from the (summary) of the run action.

AymenFJA commented 12 months ago

@mturilli two points here:

  1. Did we turn off the logs of the session in our tests on purpose as all of the *.log files are empty in the session folder?
  2. It seems like the sessions did not fail actually as I checked the tasks and they ran fine and produced output. The failure comes from converting the session into a notebook:
    Run . testenv/bin/activate
    [NbConvertApp] Converting notebook docs/source/tutorials/multiple_pilots.ipynb to notebook
    [NbConvertApp] Writing 12299 bytes to docs/source/tutorials/multiple_pilots.ipynb
    [NbConvertApp] Converting notebook docs/source/tutorials/multiple_pilots.ipynb to notebook
    Error: The action has timed out.
mtitov commented 11 months ago

@AymenFJA debug messages are within sandboxes, thus you are able to go through failed run

AymenFJA commented 11 months ago

The PR tests are failing with no module named mpi4py on the raptor worker level. The reason is that we need to instruct our pilot to create an env with mpi4py installed in it. We will update the tutorial with how to do that. This will also make the user aware of this possibility.

mtitov commented 11 months ago

@AymenFJA we don't need mpi4py in requirements then, right? (since we install it in a dedicated env)

AymenFJA commented 11 months ago

@AymenFJA we don't need mpi4py in requirements then, right? (since we install it in a dedicated env)

Agreed and done.

mtitov commented 11 months ago

@AymenFJA RAPTOR tutorial need to be reset, it happened you merged it with executed cells. Please create another PR for that (and can you please also do the same for "Configuration" tutorial in that PR)