radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

connection #233

Closed euhruska closed 8 years ago

euhruska commented 8 years ago

Some of my jobs failed, because I have a slightly unstable wifi and it reconnects from time to time. Is there a way to make sure the job doesn't fail because of that? I'm using now tmux, but that doesn't help. I still want to launch the jobs from my laptop because of convenience. Is there an option to accept missing connection for longer time? I was not sure where to post this issue.

andre-merzky commented 8 years ago

Hi again,

This is more an issue on the pilot layer than on the ExTASY layer, as RP is not really able to deal with connection aborts. We have a couple of open tickets to deal with those, but it is unlikely that we will have that implemented quickly, and I honestly doubt that RP will be very resilient at the end.

What we should be able to do relatively quickly (like, order of next 2 release cycles) is to have the MongoDB connection somewhat more stable and resilient, so as long as RP is not in the process of staging data, it should be able to accept some network drops of limited frequency and duration.

Sorry for not being able to give a more positive answer :/

Best, Andre.

euhruska commented 8 years ago

Thanks