grisu / gricli

Grisu commandline client
7 stars 2 forks source link

Errors downloading jobs #108

Closed vladimir-mencl-eresearch closed 12 years ago

vladimir-mencl-eresearch commented 13 years ago

Hi,

As I was downloading a batch of 2000 jobs (from BeSTGRID-DEV with gricli-dev.jar), I got the following error for three of the jobs:

downloading job R-bubb-pi050-0966 cannot access remote file system: Could not access file: gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/C_NZ_O_BeSTGRID_OU_University_of_Canterbury_CN_Vladimir_Mencl/grisu-dev/R-bubb-pi050-0966/a-0966.e129496: Unknown message with code "Can't connect to filesystem gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/C_NZ_O_BeSTGRID_OU_University_of_Canterbury_CN_Vladimir_Mencl: Unknown message with code "vfs.provider.gridftp/connect.error ".". download command was unsuccessful downloading job R-bubb-pi050-0967 cleaning job R-bubb-pi050-0967

I very much appreciate the download did not stop - I could just later return and download the three jobs again.

This looks like a transient error to ng2.auckland.ac.nz - and could have been an issue there (or just the connection pool exhausted?)

Cheers, Vlad

yuriyh commented 13 years ago

i have changed max number of open files, both on ng2.auckland and gram5.ceres Will see if that helps.

vladimir-mencl-eresearch commented 13 years ago

Maybe also increase it for the the backend host?

yuriyh commented 13 years ago

done for dev. prod will probably need a restart, so will change that on next outage.

Excerpts from reply+i-1300390-d945b04d08c2b8218e16db664763843b5d559fba's message of Thu Aug 04 16:36:15 +1200 2011:

Maybe also increase it for the the backend host?

makkus commented 13 years ago

What about this ticket? Can it be closed?

vladimir-mencl-eresearch commented 13 years ago

Hmmm..... increasing number of open files sounds like a good idea to fix this.

Yuriy's post says number of files increased also for backend (dev immediately, prod likely by now).

Yuriy, do you think we should also change this at all NG2s ? Could you please send a post to bestgrid-operators documenting what you did and what other site admins should do?

Cheers, Vlad

makkus commented 12 years ago

@yuriyh, anything you want to add?

Added tests for this: https://github.com/grisu/grit/blob/develop/test-templates/download_job.groovy https://github.com/grisu/grit/blob/develop/src/main/groovy/grisu/tests/tests/DownloadJobTest.groovy

Will run a set of those now, we'll see how it goes...

makkus commented 12 years ago

Looks good so far. 1000 jobs submitted and subsequently downloaded without error. Was only one 10kb input file though.

Trying 750 jobs with a 10 and also a 20 mb input file. Will download 10 in parallel, should be finished tomorrow...

makkus commented 12 years ago

No errors in my tests (so far), which means I couldn't reproduce the issue at least with the version of Grisu that is going to be released tomorrow. Will close for now...