sreeise / graph-rs-sdk

Microsoft Graph API Client And Identity Platform Client in Rust
MIT License
114 stars 30 forks source link

InvalidData: stream did not contain valid UTF-8 #465

Closed samvv closed 3 months ago

samvv commented 9 months ago

Describe the bug

I think this is a bug though I'm not entirely sure.

I get:

Custom {
    kind: InvalidData,
    error: "stream did not contain valid UTF-8",
}

To Reproduce

Taken from the examples:

let res = self
    .graph
    .drive(DRIVE_ID)
    .item_by_path(PATH_TO_DRIVE_ITEM)
    .update_items_content(BodyRead::from_async_read(reader).await?)
    .send()

reader is a variable that implements tokio::io::AsyncRead and may be used to read anything, not just Unicode.

Expected behavior

No error. A binary file should successfully upload.

Desktop:

Additional context

Still working on the same file synchronization service. It's almost finished.

sreeise commented 9 months ago

Describe the bug

I think this is a bug though I'm not entirely sure.

I get:

Custom {
    kind: InvalidData,
    error: "stream did not contain valid UTF-8",
}

To Reproduce

Taken from the examples:

let res = self
    .graph
    .drive(DRIVE_ID)
    .item_by_path(PATH_TO_DRIVE_ITEM)
    .update_items_content(BodyRead::from_async_read(reader).await?)
    .send()

reader is a variable that implements tokio::io::AsyncRead and may be used to read anything, not just Unicode.

Expected behavior

No error. A binary file should successfully upload.

Desktop:

* SDK version: v2.0.0-beta.0

* OS: Windows

* Rust: rustc 1.77.0-nightly

* Editor: VSCode

Additional context

Still working on the same file synchronization service. It's almost finished.

Im guessing this is because the Content-Type header is not set for either application/octet-stream or it may even work with text/plain.

Have you tried changing the Content-Type header? If you do try it can you let me know if this work? Not saying its something you should or shouldn't have to do but it would help debug exactly whats happening in your situation.

samvv commented 9 months ago

I will try it, thanks.

samvv commented 8 months ago

Small update: I've made the change to the source code, but I've yet to test it out. I'll keep you posted.

samvv commented 8 months ago

Nope it keeps failing. Tried both text/plain and application/octet-stream.

sreeise commented 6 months ago

Nope it keeps failing. Tried both text/plain and application/octet-stream.

Ok, thanks for reporting. I'll keep this open as a bug.

billykater commented 6 months ago

Came across the same issue today.

Investigating this I found that the most likely culprit is how the file is read (still new to rust)

Internally all BodyRead impls seem to read the whole file content into a string. As strings in rust are only allowed to contain UTF-8 content this will fail horribly if you try to upload a binary file. Changing the content of my file to a UTF-8 string allowed me to get a response from the server instead of a preflight error.

samvv commented 6 months ago

Oh thanks for investigating this! I'm wondering if the binary data is then still valid? Uploading binary as UTF-8, even with the right header, does not seem valid.

billykater commented 6 months ago

Sorry for being unclear in my response: I just tried to upload a file containing a test string "test" instead of my actual content to validate my suspicion. Of course you simply can't convert any random [u8] to a valid string (hence the error we are both experiencing).

I am still digging to see if there is an option to bypass the BodyRead struct to upload binary files.

buhaytza2005 commented 6 months ago

I am achieving something similar using an upload_session:

pub async fn upload_xlsx_async(access_token: &str, local_file_path: &str, remote_file_path: &str) -> GraphResult<()> {
    let client = Graph::new(access_token);
    println!("Uploading file {} to {}", local_file_path, remote_file_path);
    // Get site id
    let try_group = client.me().followed_sites().list_followed_sites().send().await?;
    let grp: serde_json::Value = try_group.json().await?;
    let site_id = get_site_id_by_name(grp, "automation".to_string()).unwrap();

    // JSON specifying the conflict behavior when uploading, more details:
    // https://learn.microsoft.com/en-us/graph/api/resources/driveitem?view=graph-rest-1.0
    let upload = serde_json::json!({
        "@microsoft.graph.conflictBehavior": Some("rename".to_string())
    });

    let response = client
        .site(&site_id)
        .drive()
        .item_by_path(format!(":{}:", remote_file_path))
        .create_upload_session(&upload)
        .send()
        .await?;

    let file = tokio::fs::File::open(local_file_path).await?;

    let mut iter = response.into_upload_session_async_read(file).await?;
    while let Some(result) = iter.next().await {
        let response = result?;
        println!("{response:#?}");  // Print the response for each chunk uploaded
    }

    Ok(())
}

And then called:

use anyhow::Result;
use graph_rs_sdk::Graph;
use lib_o365::utils::file_upload::upload_xlsx_async;

#[tokio::main]
async fn main() -> Result<()> {
    env_logger::init();

    let access_token = lib_o365::authentication::log_me_in().await.unwrap();
    let client = Graph::new(&access_token);

    let f_name = "tst.pdf";
    let remote_path = format!("/Data categorisation/Trading figures/{}", f_name);
    upload_xlsx_async(
        &access_token,
        format!(".temp_files/{}",f_name).as_str(),
        &remote_path)
        .await?;
    Ok(())
}

For more of an update approach, @microsoft.graph.conflictBehavior can be set to replace.

There might be differences in terms of efficiency but an async upload does the trick for my use case.

sreeise commented 6 months ago

@samvv @buhaytza2005 @billykater Thanks yall for continuing to look into this.

For context, the upload sessions using multipart upload have to know the size of the content before hand because each request must specify what range of bytes is being sent out of the total size of the content which is why all of the data is read in.

Just off the top of my head I do wonder how would binary files be uploaded via an http api if they cannot be converted to valid UTF-8 because that is actually required for http requests. But there probably is some solution to this that i'm just not aware of or forgot about.

sreeise commented 3 months ago

Came across the same issue today.

Investigating this I found that the most likely culprit is how the file is read (still new to rust)

Internally all BodyRead impls seem to read the whole file content into a string. As strings in rust are only allowed to contain UTF-8 content this will fail horribly if you try to upload a binary file. Changing the content of my file to a UTF-8 string allowed me to get a response from the server instead of a preflight error.

I should have looked closer at this. You are right. https://github.com/sreeise/graph-rs-sdk/blob/7aa756af45fafec43156d798fff1c47c3fb04b9b/graph-http/src/core/body_read.rs#L30-L34

The impl needs to change to handle this properly.

I don't know why I was thinking http required to be UTF-8. It does not.

sreeise commented 3 months ago

PR in progress. Still need to do some testing. The PR changes what readers and other byte types are converted to and removes the string conversion of course. Each of these types will be read into a Vec<u8> and then passed to the reqwest::Body or reqwest::blocking::Body. Its a quick solution for now and we can optimize later.

@billykater @samvv @buhaytza2005 Apologies that I didn't read through this well enough to understand what the issue was originally. This is completely valid. Hopefully this will help.

If any of you get a chance to try things out and test a bit I would appreciate the feedback. Here is the branch: https://github.com/sreeise/graph-rs-sdk/tree/465-fix

sreeise commented 3 months ago

Thank you @billykater, @samvv, and @buhaytza2005 for continuing to report on this issue and for finding out what was causing it.

I have merged a fix for this issue and added examples as well as updated the README.

sreeise commented 3 months ago

Updated in crate version 2.0.1 https://crates.io/crates/graph-rs-sdk/2.0.1