jglobus / JGlobus

jGlobus is a collection of Java client libraries for Globus® Toolkit security, GRAM, and GridFTP.
http://www.globus.org/toolkit/jglobus/
Apache License 2.0
24 stars 44 forks source link

GridFTPClient's extendedMultipleTransfer hangs when transferring more than 1000 small files #134

Open felipeleao opened 10 years ago

felipeleao commented 10 years ago

I'm currently developing an application with jGlobus (GridFTP) that enables the transfer of "lots of small files" (LOSF). It enables the user to select a directory in an Endpoint and point the destination directory in another endpoint. Once the user's access to both directories in the Endpoints is checked, my application lists all files and subfiles inside the source directory and populates a String[] with each file name considering it's canonical name, such as:

String[] sourceFiles = new String[]{ "/home/felipeleao88/losf/file1.txt", "/home/felipeleao88/losf/file2.txt", "/home/felipeleao88/losf/file3.txt", "/home/felipeleao88/losf/subdir/sub_file1.txt", "/home/felipeleao88/losf/subdir/sub_file2.txt", "/home/felipeleao88/losf/subdir/sub_file3.txt", "/home/felipeleao88/losf/subdir/anotherdir/sub_sub_file1.txt", "/home/felipeleao88/losf/subdir/anotherdir/sub_sub_file2.txt" };

Another String[] is populate to indicate the directory and names each file in sourceFile array should assume in the destination. This way I have two arrays as required by GridFTPClient's method extendedMultipleTransfer() (I'm using the complete method alternative, sending offsets and files sizes). Everything works like a charm when I send either a single file or a list of files, but when the such list is very big the extendedMultipleTransfer() method behaves in a weird way.

When sending an array describing more then a thousand files the method simply stops responding after a while, not throwing exceptions, neither finishing in a wrong way. It basically starts the transfer, transfer around a 100 files and hangs. I tried transferring 800 files, 850 files, 950 files and other random numbers and it works fine, all files are transferred, but when my list has over a thousand files it hangs.

Usually it transfers around 100 files before hanging, but this behaviour is not stable, since I also had random numbers of transferred files (such as 77, 96, 107, 104) before the method freezes. In my test I'm using generated bulk files of 2M each. I also already checked if the endpoints have free disk space

I used netstat to watch open tcp connections, and the application is correctly openning the amount of TCP connections I asked it to use (4 connections) they remaing active even after the method begins hanging and eventually I have to force the end of the transfer manually through command line.

I'm running CentOS 6.3 in both endpoints and the application itself runs over Fedora 20. My application uses the following jglobus maven dependencies:

org.jglobus JGlobus-Core 2.0.4 org.jglobus JGlobus-GridFTP 2.0.0 org.jglobus JGlobus-Tomcat 2.0.4 org.jglobus myproxy 2.0.6

My call to the extendedMultipleTransfer method: OBS: classes gridFtpClientSrc and gridFtpClientDst are instances of the GridFTPClient object.

//setting passive and active FTP modes gridFtpClientSrc.setPassive(); HostPort hp = gridFtpClientDst.setPassive(); gridFtpClientSrc.setActive(hp);

//Call to initialize the transfer gridFtpClientSrc.extendedMultipleTransfer( resumeOffsets, //long[] object with offsets resumeSizes, //source files lengths sourceFiles, //String[] with source files paths gridFtpClientDst, //GridFTPClient object for destination resumeOffsets, //long[] object with offsets destinationFiles, //String[] with destination files paths new MarkerListener() { public void markerArrived(Marker arg0) { // ommited simple control method } }, new MultipleTransferCompleteListener(){ public void transferComplete(MultipleTransferComplete mtc) { // ommited simple control method } } );