veeso / suppaftp

a super FTP/FTPS client library for Rust with support for both passive and active mode
Apache License 2.0
112 stars 28 forks source link

[QUESTION] - How to properly use resume_transfer #42

Closed shueit closed 10 months ago

shueit commented 1 year ago

Hi,

Thank you for your work on this crate!

I'm using it to create a custom ftp client with a priority queue of files to upload/download, and I'm facing difficulties to restart a file transfer from a given point. I call the resume_transfer function, but then when I call either put_file or retr_as_stream, it restarts from the beginning each time, acting like there was no call to resume_transfer at all.

async fn restart_upload(mut ftp_stream: AsyncFtpStream, mut file: &async_std::fs::File, destination_filename: String) {
    let uploaded_bytes = ftp_stream.size(destination_filename.clone()).await
        .expect("No file at the given path on the FTP server");

    let _ = ftp_stream.resume_transfer(uploaded_bytes).await.expect("Resume failed"); 
    let uploaded_bytes = ftp_stream.put_file(destination_filename, &mut file).await.expect("Failed to upload the file");
                                 // Act as if there was no call to resume_transfer
}

For the upload, I managed to find a workaround with append_file and File trait Seek :

async fn restart_upload(mut ftp_stream: AsyncFtpStream, mut file: &async_std::fs::File, destination_filename: String) {
    let uploaded_bytes = ftp_stream.size(destination_filename.clone()).await
        .expect("No file at the given path on the FTP server");

    // Move the cursor after "uploaded_bytes" from the start
    let _seeked_bytes = file.seek(SeekFrom::Start(uploaded_bytes as u64)).await
        .expect("Could not move the cursor to given position");

    ftp_stream.append_file(destination_filename.as_str(), &mut file).await.expect("Failed to upload the file");
}

But I can't do the same thing with the DataStream I get back from the retr_as_stream call. At the moment, I am doing a full re-download and throwing away the part I already have with io::sink, but it's not optimal at all :/

async fn restart_download(mut ftp_stream: AsyncFtpStream, local_file_path: String, wanted_filename: String) {
    let mut file = async_std::fs::OpenOptions::new()
        .append(true)        
        .open(local_file_path.as_str()).await.expect("Could not open file");
    let file_size = file.metadata().await.unwrap().len();

    let _ = ftp_stream.transfer_type(Binary).await.expect("Could not set transfert type to binary");
    let _ = ftp_stream.resume_transfer(file_size as usize).await.expect("Resume failed");

    let mut data_stream = ftp_stream.retr_as_stream(wanted_filename).await
        .expect("Could not find the given file on remote server");

    // Discard already downloaded bytes. Imply a full re-download
    async_std::io::copy(&mut data_stream.by_ref().take(file_size), &mut io::sink()).await.expect("Could not skip the bytes");

    // Read the rest
    async_std::io::copy(&mut data_stream, &mut file).await.expect("Error during file downloading");

    ftp_stream.finalize_retr_stream(data_stream).await.unwrap();
}

Am I missing something ?

veeso commented 1 year ago

AFAIK you have to seek your file from the REST position first.

shueit commented 1 year ago

In which case ? I'm already doing it to restart an upload, but regarding restarting a download, even if I seek my local file to the last byte downloaded, the io::copy I'm doing with the data_stream appends the whole file at the end of what I already have, and not just the missing part. For exemple, if my local file contains "AB", and my remote file is "ABCDE", I will end with "ABABCDE" on my local one.

veeso commented 1 year ago

try with Write instead of copy

shueit commented 1 year ago

I didn't find any write method with io lib, and for fs it erases the existing file anyway so I went for this piece of code, replacing the copy lines:

    let mut buf: Vec<u8> = Vec::with_capacity(remaining_bytes);
    data_stream.read_to_end(&mut buf).await.unwrap();
    file.write(&buf).await.unwrap();

But I'm not sure if I understood well what you were meaning with trying with Write instead.

Still, it puts in buf the entire file, and since we store everything in memory the memory consumption becomes really high with big files with that.

shueit commented 1 year ago

After inspecting server side logs between FileZilla client and this crate, I found only 1 difference in the command order when doing REST. FileZilla does PASV -> REST -> RETR But your crate does REST -> PASV -> RETR

I made a temporary function in your code source to do a dirty test in order to reproduce FileZilla order, basically resume_transfer and retr_as_stream in a single function, and it worked !

    pub async fn retr_as_stream_re<S: AsRef<str>>(
        &mut self,
        file_name: S,
        offset: usize
    ) -> FtpResult<DataStream<T>> {
        let addr = self.pasv().await?;
        self.perform(Command::Rest(offset)).await?;
        self.read_response(Status::RequestFilePending).await?;

        self.perform(Command::Retr(file_name.as_ref().to_string())).await?;
        let data_stream = TcpStream::connect(addr)
            .await
            .map_err(FtpError::ConnectionError)?;

        self.read_response_in(&[Status::AboutToSend, Status::AlreadyOpen])
            .await?;
        Ok(DataStream::Tcp(data_stream))
    }

I didn't test it for the upload, but I'm pretty sure that it's the same problem, since the code and command order are quite the same of what I saw.

If you could fix this it would be much appreciated ^^

Edit : RFC3659 document stipulates that

The REST command must be the last command issued before the data transfer command that is to cause a restarted, rather than a complete, file transfer. The effect of issuing a REST command at any other time is undefined. The server-PI may react to a badly positioned REST command by issuing an error response to the following command, not being a restartable data transfer command, or it may save the restart value and apply it to the next data transfer command, or it may silently ignore the inappropriate restart attempt.

Source : https://datatracker.ietf.org/doc/html/rfc3659#section-5.3