radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Modules loaded on ARCHER for coco kernel are broken #236

Closed ibethune closed 8 years ago

ibethune commented 8 years ago

Hi, due to some module changes on ARCHER(!) the modules we try to load for the coco kernel no longer work.

I think to make the code more robust we should just load the following (without specifying versions, except for the main python version, which is unlikely to change, and we need to have python 2.x rather than 3.x), in the pre_exec for archer in coco.py:

module load python-compute/2.7.6
module load pc-numpy
module load pc-scipy
module load pc-coco
module load pc-netcdf4-python

I tested this combination and they work fine

ashkurti commented 8 years ago

I have problems with coco now (https://gist.github.com/ashkurti/0c09f3d245c917321707), most probably due to the change of names and versions of the modules on ARCHER. I think this is the same scenario that also @vivek-bala pointed to yesterday.

The solution of @ibethune looks the most straightforward to me now.

ibethune commented 8 years ago

Same thing for the tleap.py kernel. According to my reading of the code, we should be able to use only:

module load python-compute/2.7.6
module load amber
vivek-bala commented 8 years ago

I think the tleap fails because of changes in the coco folder structure. https://github.com/radical-cybertools/ExTASY/blob/master/examples/coco_amber/helper_scripts/postexec.py#L4 needs to be changed to from coco import script

vivek-bala commented 8 years ago

@ashkurti could you run it again with RADICAL_ENMD_VERBOSE=info and RADICAL_PILOT_VERBOSE=debug and paste the log please ?

ibethune commented 8 years ago

OK, then the pc-coco and its dependencies are indeed needed (but same versions as for the coco.py)

ibethune commented 8 years ago

I can confirm making the change from coco import script and fixing the modules as above, then the coco-amber workflow works as expected...

ashkurti commented 8 years ago

The job I launched with RADICAL_ENMD_VERBOSE=info and RADICAL_PILOT_VERBOSE=debug is taking too long (almost an hour but not finished yet), so far no errors detected but the output up to now at https://gist.github.com/ashkurti/0ba635c8230ed8a82efa.

vivek-bala commented 8 years ago

Thanks @ibethune

ashkurti commented 8 years ago

The job that I launched before is pending now (https://gist.github.com/ashkurti/99cffb63f2b22bda0a8a) ... we had not encountered pending problems from a long time.

ashkurti commented 8 years ago

So I assume the same problem should have been verified on stampede as well (in terms of problem at from extasy import script) - any idea as to why this does not show up on stampede...

ibethune commented 8 years ago

Hi Vivek, none of the three changes discussed in this ticket (coco.py, tleap.py and postexec.py) are in the master / extasy_data branches. Please let me know when this is done so I can retest.

vivek-bala commented 8 years ago

This can be tested again. Please reinstall the master branch of enmd for this.

ibethune commented 8 years ago

The coco/amber kernels are now working - thanks!