Closed vladimir-mencl-eresearch closed 12 years ago
i have changed max number of open files, both on ng2.auckland and gram5.ceres Will see if that helps.
Maybe also increase it for the the backend host?
done for dev. prod will probably need a restart, so will change that on next outage.
Excerpts from reply+i-1300390-d945b04d08c2b8218e16db664763843b5d559fba's message of Thu Aug 04 16:36:15 +1200 2011:
Maybe also increase it for the the backend host?
What about this ticket? Can it be closed?
Hmmm..... increasing number of open files sounds like a good idea to fix this.
Yuriy's post says number of files increased also for backend (dev immediately, prod likely by now).
Yuriy, do you think we should also change this at all NG2s ? Could you please send a post to bestgrid-operators documenting what you did and what other site admins should do?
Cheers, Vlad
@yuriyh, anything you want to add?
Added tests for this: https://github.com/grisu/grit/blob/develop/test-templates/download_job.groovy https://github.com/grisu/grit/blob/develop/src/main/groovy/grisu/tests/tests/DownloadJobTest.groovy
Will run a set of those now, we'll see how it goes...
Looks good so far. 1000 jobs submitted and subsequently downloaded without error. Was only one 10kb input file though.
Trying 750 jobs with a 10 and also a 20 mb input file. Will download 10 in parallel, should be finished tomorrow...
No errors in my tests (so far), which means I couldn't reproduce the issue at least with the version of Grisu that is going to be released tomorrow. Will close for now...
Hi,
As I was downloading a batch of 2000 jobs (from BeSTGRID-DEV with gricli-dev.jar), I got the following error for three of the jobs:
downloading job R-bubb-pi050-0966 cannot access remote file system: Could not access file: gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/C_NZ_O_BeSTGRID_OU_University_of_Canterbury_CN_Vladimir_Mencl/grisu-dev/R-bubb-pi050-0966/a-0966.e129496: Unknown message with code "Can't connect to filesystem gsiftp://ng2.auckland.ac.nz/home/grid-bestgrid/C_NZ_O_BeSTGRID_OU_University_of_Canterbury_CN_Vladimir_Mencl: Unknown message with code "vfs.provider.gridftp/connect.error ".". download command was unsuccessful downloading job R-bubb-pi050-0967 cleaning job R-bubb-pi050-0967
I very much appreciate the download did not stop - I could just later return and download the three jobs again.
This looks like a transient error to ng2.auckland.ac.nz - and could have been an issue there (or just the connection pool exhausted?)
Cheers, Vlad