yoshidan / google-cloud-rust

Google Cloud Client Libraries for Rust.
MIT License
243 stars 87 forks source link

Storing file greater than 2GB doesn't work #136

Open ixchelchakchel opened 1 year ago

ixchelchakchel commented 1 year ago

Hi, I am trying the library to store large file > 2GB and have trouble doing the same, here is my code for reference:

use google_cloud_default::WithAuthExt;
use google_cloud_storage::client::{Client, ClientConfig};
use google_cloud_storage::http::objects::upload::{Media, UploadObjectRequest, UploadType};

#[tokio::main]
async fn main() {
    let config = ClientConfig::default().with_auth().await
        .expect("failed getting credential");
    let client = Client::new(config);
    let data = vec![1; 3_000_000_000];
    println!("Storing {:.2}GB of file", data.len() as f64 / 1073741824f64);
    let upload_type = UploadType::Simple(Media::new("test_file.bin"));
    client
        .upload_object(
            &UploadObjectRequest {
                bucket: "my_bucket_name".to_string(),
                ..Default::default()
            },
            data,
            &upload_type,
            None
        )
        .await.expect("failed storing file");
    println!("Stored successfully");
}

this doesn't work the upload seems to be hung, not network activity have waited for over 2hrs. whereas if I reduce use this data to less than 2GB let data = vec![1; 2_000_000_000]; it works?

Any ideas to troubleshoot this would be really helpful?

yoshidan commented 1 year ago

You may be able to handle this with reqwest options, etc., but I don't think you will be uploading large amounts of size data in bulk, so it is better to use upload_streamed_object. You can send more than 2GB of data in chunks of a few MB as shown in the sample code below.

        let chunks = vec![vec![1 as u8; 3_000_000]; 1_000];
        let chunks: Vec<Result<Vec<u8>, io::Error>> = chunks.into_iter().map(Ok).collect();
        let stream = futures_util::stream::iter(chunks);

        let mut media = Media::new("test_file.bin");
        media.content_length = Some(3_000_000_000);
        let upload_type = UploadType::Simple(media);

        let result = client.upload_streamed_object(&UploadObjectRequest {
            bucket: bucket_name.to_string(),
            ..Default::default()
        }, stream, &upload_type).await;

This method does not allow resuming from the middle of the process if an error occurs during the process, so resumable_upload is recommended.

ixchelchakchel commented 1 year ago

Thanks will try this. what's the recommended way to do fixed number of retries for upload/download/remove method in case of network failure, 503 error or other errors? In the meantime i would try to see if I can use the backon crate for the retries - https://docs.rs/backon/latest/backon/

yoshidan commented 1 year ago

We do not prescribe a recommended method of performing a fixed number of retries; using backon, this can be accomplished with the following code.

async fn upload(client: StorageClient) {
        let metadata = Object {
            name: "testfile".to_string(),
            content_type: Some("video/mp4".to_string()),
            ..Default::default()
        };

        // start uploading
        let uploader = client
            .prepare_resumable_upload(
                &UploadObjectRequest {
                    bucket: bucket_name.to_string(),
                    ..Default::default()
                },
                &UploadType::Multipart(Box::new(metadata)),
            )
            .await
            .unwrap();

        // split chunk
        let mut chunk1_data: Vec<u8> = (0..256 * 1024).map(|i| (i % 256) as u8).collect();
        let chunk2_data: Vec<u8> = (1..256 * 1024 + 50).map(|i| (i % 256) as u8).collect();
        let total_size = Some(chunk1_data.len() as u64 + chunk2_data.len() as u64);
        let chunk1 = ChunkSize::new(0, chunk1_data.len() as u64 - 1, total_size);

        // upload chunk1 with retry
        let upload_chunk1 = || async {
            uploader.clone().upload_multiple_chunk(chunk1_data.clone(), &chunk1).await
        };
        upload_chunk1
            .retry(&ExponentialBuilder::default())
            .when(|e: &Error| match e {
                Error::Response(e) => e.is_retriable(),
                _ => false
            }).await.unwrap();
}