radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

ENTK get_started.py on local machine #81

Closed uvaaland closed 5 years ago

uvaaland commented 5 years ago

I am getting started with ENTK on my local Mac following the steps at: https://radicalentk.readthedocs.io/en/latest/user_guide/get_started.html

RabbitMQ and MongoDB have been successfully installed, but I keep getting the following error when I run python get_started.py:

EnTK session: re.session.nat-oitwireless-inside-vapornet100-c-21874.Princeton.EDU.brosten.017920.0005
Creating AppManager                                                           ok
Validating and assigning resource manager                                     ok
Setting up RabbitMQ system                                                    ok
2019-01-24 12:52:58,549: radical.entk.resource_manager.0000: MainProcess                     : pmgr.0000.subscriber._state_sub_cb: ERROR   : Pilot has failed
All components created
Update: Pipeline pipeline.0000 in state SCHEDULING
Update: Stage stage.0000 in state SCHEDULING
Update: Task task.0000 in state SCHEDULING
Update: Task task.0000 in state SCHEDULED
Update: Stage stage.0000 in state SCHEDULED
/Users/brosten/miniconda3/envs/entk/lib/python2.7/site-packages/pymongo/topology.py:149: UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#is-pymongo-fork-safe
  "MongoClient opened before fork. Create MongoClient only "
All components terminated

As you see, the Pilot fails.

To further look into this, I went to the RADICAL-Pilot documentation and tried running the 00_getting_started.py script at: https://radicalpilot.readthedocs.io/en/latest/user_guide/00_getting_started.html, which works. So I don't understand why the Pilot fails in this ENTK example.

I have run the script with RADICAL_ENTK_VERBOSE=DEBUG, but did not get much further. Let me know if you need more information than the above message and I will provide it.

Thanks!

Uno

vivek-bala commented 5 years ago

Hey Uno, thanks for trying it out!

Can you rerun the EnTK example with another environment variable set up: export RADICAL_PILOT_VERBOSE=DEBUG?

You should you have two folders created that you can send to me. They will contain all the log files. A "client" folder (1) with the name re.session.*```` will be created in the location where the EnTK script exists. A "remote" folder (2) with the same name will be created at$HOME/radical.pilot.sandbox/```.

Also if you made any changes to the example script or any configs, let me know.

uvaaland commented 5 years ago

Hi, Vivek!

Thanks for taking the time. I reran the example with export RADICAL_PILOT_VERBOSE=DEBUG and the folders that you asked for can be found as a tar.gz at the following link:

https://drive.google.com/file/d/1qoKY8zIoflzTCQ-dKgpZIzE6-UX85ffF/view?usp=sharing

I have not made any changes to the example script. I got it locally in my virtual environment (as suggested in the tutorial).

vivek-bala commented 5 years ago

Thanks! Are you running everything in a virtual box or some sort of sandbox?

uvaaland commented 5 years ago

I am running it in a conda environment and have also tried it using virtualenv and get the same result in both cases. I am not using a VM on top of this.

vivek-bala commented 5 years ago

Ah found it. Seems like you're using python3:

Fatal Python error: initsite: Failed to import the site module
Traceback (most recent call last):
  File "/Users/brosten/radical.pilot.sandbox/ve.local.localhost.0.50.21/lib/python3.7/site.py", line 67, in <module>
    import os
  File "/Users/brosten/radical.pilot.sandbox/ve.local.localhost.0.50.21/lib/python3.7/os.py", line 661, in <module>
    from _collections_abc import MutableMapping
ModuleNotFoundError: No module named '_collections_abc'
virtualenv-1.9/virtualenv.py:1188: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Using base prefix '/Users/brosten/miniconda3'

We don't yet support python3, can you try with python2.7 please? You probably tried the RP example with python2.7. Let me know how it goes!

uvaaland commented 5 years ago

Thanks for pointing me in the right direction. This is very interesting because I am running it with python 2.7 in a conda environment. However, it appears that when the pilot starts, it creates its own virtual environment:

# cmd: /Users/brosten/miniconda3/bin/python virtualenv-1.9/virtualenv.py /Users/brosten/radical.pilot.sandbox/ve.local.localhost.0.50.21

but for some reason finds a python executable outside my conda environment to do so /Users/brosten/miniconda3/bin/python.

I have a good idea of how to fix this. However, I think this behavior is pretty interesting. One would probably want the pilot to pick the same executable as the program was run with. This issue might only appear if using a python2 environment in Anaconda3, but I will take a bit of time to see if I can get to the bottom of it.

Thanks again, Vivek!

vivek-bala commented 5 years ago

Thanks for checking that! You're right in that the pilot creates its own VE. This VE (rather the commands that create this VE) is somewhat configured such that it can work on different HPCs (Titan, Summit, etc.). The VE on your machine/laptop might not be the correct one to use on the HPC and hence a fresh VE on the remote side.

The pilot side VE would pick up the python that is loaded by default on a new terminal for you. So if you find python3 loaded by default, you might want to check some setting that configures so. Many times it is $HOME/.bashrc. Hope that helps.

uvaaland commented 5 years ago

Yes, then we are on the same page. That makes perfect sense to me. To work around this, I will create a virtual machine that runs python2 as the default and play with ENKT there.

I'll leave the ticket open until I get my VM set up and do a test run, but I expect this to take care of it.

uvaaland commented 5 years ago

Successfully ran the example on a VM.

Final question, is there a file where you can see what the program printed during execution, e.g. "Hello World" in this example?

vivek-bala commented 5 years ago

Awesome ! The standard output should be written to a STDOUT file located inside $HOME/radical.pilot.sandbox/re.session.*/pilot.0000/unit.*. You can bring this file to the client side by using the data movement properties.

You can add task.download_output_data = ['STDOUT'] to the EnTK script. This would bring that file to where your EnTK script resides.

uvaaland commented 5 years ago

Great, thank you for your help! :)