radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Gromacs/LSDMap fails on ARCHER in data staging step #157

Closed ibethune closed 9 years ago

ibethune commented 9 years ago

Hi all, I did a test of Gromacs/LSDMap on ARCHER, and it failed during the 'StagingOutput' step from the first Analysis CU:

ExTASY version :  0.1.3.1-beta-15-gf2a2457
Session UID: 55196431d7bf7579ecc3f07e
[Callback]: ComputePilot 'None' state changed to Launching.
Pilot UID       : 55196431d7bf7579ecc3f080
...
Starting Analysis
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to PendingInputStaging.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to StagingInput.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to PendingExecution.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to Scheduling.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to Executing.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to StagingOutput.
[Callback]: ComputeUnit '551965f7d7bf7579ecc3f09b' state changed to Failed.

The failure looks suspiciously like those seen in #156 :+1:

2015:03:30 16:05:07 radical.pilot.MainProcess: [DEBUG   ] read : [   19] [  159] ( ls /work/e290/e290/e290ib/radical.pilot.sandbox/pilot-55196431d7bf7579ecc ^H3f080/unit-551965f7d7bf7579ecc3f09b/lsdmap.log\nCouldn't send packet: Broken pipe\n)
2015:03:30 16:05:08 radical.pilot.MainProcess: [ERROR   ] Output transfer failed: read from process failed '[Errno 5] Input/output error' : (s   --:-- ETA/home/h012/ibethune/testlsdmap2/input.gro     100%  105KB 104.7KB/s   00:00
sftp>  ls /work/e290/e290/e290ib/radical.pilot.sandbox/pilot-55196431d7bf7579ecc ^H3f080/unit-551965f7d7bf7579ecc3f09b/lsdmap.log
Couldn't send packet: Broken pipe

Note I have the .saga.cfg file set up here:

(testcoco8)[ibethune@hydra testlsdmap2]$ cat ~/.saga.cfg
[saga.util.pty]
ssh_share_mode = no
ibethune commented 9 years ago

Fixed by the .saga.cfg file (with saga.utils.pty typo corrected).

I think we should be making this behaviour the default, as it has affected several of us now.

vivek-bala commented 9 years ago

This should be fixed in the latest RP. ssh share mode has been made the default for CentOS.