Allow writing to request stream for uploading content

NickDarvey commented 1 year ago

Is your feature request related to a problem? Please describe. Developers should be able to write to the request stream for uploads so they can generate content 'on the fly' without needing to buffer it in memory.

Describe the solution you'd like

An overload for ContentRequestBuilder.PutAsync which accepts a Stream -> () delegate . That is:

public async Task PutAsync(Action<Stream> write, Action<ContentRequestBuilderPutRequestConfiguration>? requestConfiguration = default) { }

Describe alternatives you've considered

Buffering content in-memory by using a MemoryStream and using the existing overload which accepts a Stream.

Additional context

There is an overload for PutAsync which accepts an existing stream which is useful for pull-based streaming where the content already exists. For example, from a file.

https://github.com/microsoftgraph/msgraph-sdk-dotnet/blob/7a2be45d2cf37f18a32cc9a60d0edf441fd23a08/src/Microsoft.Graph/Generated/Drives/Item/Items/Item/Content/ContentRequestBuilder.cs#L67-L79

The overload I'm proposing would be better for push-based streaming where you're generating content because it doesn't require you to hold it in-memory. For example:

constructing a zip archive from many files[^1]
serializing an object graph[^2]

I understand this SDK is generated. I'm guessing this kind of overload will either need to be supported by Kiota or added as an extension manually (if Kiota's interfaces actually allow access to the request stream.)

[^1]: Streaming Zip on ASP.NET Core https://blog.stephencleary.com/2016/11/streaming-zip-on-aspnet-core.html An article which demonstrates the push-pull distinction with zip archives and ASP.NET Core. [^2]: Uploading data with HttpClient using a "push" model https://thomaslevesque.com/2013/11/30/uploading-data-with-httpclient-using-a-push-model/ An article which demonstrates the push-pull distinction with serialization ASP.NET.

andrueastman commented 1 year ago

Transferred to https://github.com/microsoft/kiota/issues/2413 as this would require the change from Kiota.

NickDarvey commented 1 year ago

@andrueastman, I don’t think this should be closed just because it requires a change in Kiota because:

making that change in Kiota is but a step in delivering this enhancement
I’d still want this enhancement if this library is rewritten yet again using some other generator

andrueastman commented 1 year ago

Hey @NickDarvey.

I hear you on this. I don't there are plans to move from Kiota anytime soon though. Will re-open for now and keep it dependent on https://github.com/microsoft/kiota/issues/2413

baywet commented 1 year ago

Hi @NickDarvey, Thanks for bringing this up and for using the dotnet SDK. I'd be curious to know more about your scenario.

You'd effectively like to have an inversion of control for the stream argument so in case your application "is handling something large, that's not structured and that's not coming from network/storage/... it doesn't need to be buffered in memory".

structured payloads (JSON/text/form/and soon multipart in case of Graph) will have a modeled representation, which means the put method accepts a model instead.
unstructured large payloads are likely to come from storage/network which will provide a stream and avoids to load the whole thing in memory.
unstructured payloads that are the results of another lib (let's say your app is calling a lib generating PDFs or CSVs) will likely already provide you a stream that's in memory.

Can you come up with such an example please?

Additionally, I'm curious about the choice of an Action and not Action<Task> or Task can you expand on that topic too please?

NickDarvey commented 1 year ago

Hi @baywet,

Can you come up with such an example please?

Here's an example, in code, paraphrased from actual code in our application.

open System.IO
open System.IO.Compression

let forExample (client : Microsoft.Graph.GraphServiceClient) = task {

  // A function which generates content on-the-fly.
  // Here it takes strings as input, but really it might be taking stream
  // which, in turn, are also generated on-the-fly.
  let zip (texts : (string * string) seq) (stream : Stream) =
    use archive = new ZipArchive (stream, ZipArchiveMode.Create, leaveOpen = true)

    for (name, text) in texts do
      let entry = archive.CreateEntry name
      use stream = entry.Open ()
      use writer = new StreamWriter (stream)
      writer.Write text

  // Some example content.
  let examples = [
    "a", "The Coulibiac of Salmon from Guy Savoy"
    "b", "Vissa D'arte from Tosca"
    "c", "Cote du Rhone Chateauneuf de Pape '47"
  ]
  let item = client.Drives["abc"].Items["xyz"].Content

  // What I have to do now.
  use stream = new MemoryStream ()
  do zip examples stream
  stream.Position <- 0
  do! item.PutAsync stream

  // What I want to do (this enhancement request).
  do! item.PutAsync (fun stream -> zip examples stream)
}

These two articles describe the value of the push model of streaming and give further examples in more detail:

Streaming Zip on ASP.NET Core https://blog.stephencleary.com/2016/11/streaming-zip-on-aspnet-core.html An article which demonstrates the push-pull distinction with zip archives and ASP.NET Core.
Uploading data with HttpClient using a "push" model https://thomaslevesque.com/2013/11/30/uploading-data-with-httpclient-using-a-push-model/ An article which demonstrates the push-pull distinction with serialization ASP.NET.

Additionally, I'm curious about the choice of an Action and not Action or Task can you expand on that topic too please?

Accepting a Stream -> Task would make more sense actually so one could use the async methods on Stream. (I believe that's a Func<Stream,Task> delegate C# world.)

baywet commented 1 year ago

Thanks for providing the additional context here. It does feel like an edge case because in the vast majority of cases the application would already have a stream coming from the outside (network/storage) or from another library (document generation, etc...) which could be piped in the compression stream. And the compression stream reference would be passed to the generated API surface.

If we wanted to enable an inversion of control for that scenario, we'd need to revamp the whole API surface, from the generation, but including the request information property and the serialization writer interface which would represent a breaking change. Additionally the request adapter implementation would need to be updated to use a PushContentStream instead. While this would save memory pressure for all scenarios, it'd make things like retry handling more complex as the content length would not be known in advance anymore

This is something we might do for kiota v2 and a later major version of this SDK, but for the time being this scenario will be limited to the current implementation you've provided.

@andrueastman when you have a couple of minutes, would you mind exploring the buffering aspect for retry handling an put together a POC with the PushStream trying to access the content multiple times please?

andrueastman commented 1 year ago

Just to follow up @baywet

The retry handler will not retry on requests that are not buffered. This check is used to validate if a clone of a request can be made so that a replay of the request is made.

https://github.com/microsoft/kiota-http-dotnet/blob/58fb5f4ad6edbaf54dfe184fa2d3b8538bc558f2/src/Extensions/HttpRequestMessageExtensions.cs#L81

baywet commented 1 year ago

Thanks for confirming. This means we'd effectively loose all retry capabilities. And it'd only benefit the case of aggregating streams as in any other case a stream is either already directly coming from the source or already in memory.

After more réflexion, we're unlikely to do this at all at this point.

NickDarvey commented 1 year ago

It doesn't have to be implemented in a way that would make you lose retry capabilities everywhere, it could just mean that you can't have retry handling when you're using push-based streams. It's a reasonable trade-off if you care about memory usage. If you care about retry, you have to buffer it first and provide the buffered stream.

It would benefit the case of [push] streams yes, but it doesn't have to come at the cost of pull-based streams. (The existing pull-based stream overload should still exist, and work as-is).

baywet commented 1 year ago

The challenge with implementing both is that we'd be duplicating a lot of the infrastructure:

the Body property on the request information
the request adapter implementation
some of the generated code, which would come at a binary size trade-off

I'll let @maisarissi our PM talk to that, but it's unlikely we'd invest for this edge case at this point. Not unless we see significant customer demand for it.

NickDarvey commented 1 year ago

Maybe there ought to be a general “escape hatch” so that when a developer encounters a limitation like this they can always fallback to adding content to the HttpRequestMessage themselves.

baywet commented 1 year ago

This is already possible today with the core part of the SDK. See the readme of this repository. https://github.com/microsoftgraph/msgraph-sdk-dotnet-core

NickDarvey commented 1 year ago

This is already possible today with the core part of the SDK

Ah, but that doesn't have the RequestInformation-request-building bits.

microsoftgraph / msgraph-sdk-dotnet

Allow writing to request stream for uploading content #1686