justcoding121 / titanium-web-proxy

A cross-platform asynchronous HTTP(S) proxy server in C#.
MIT License
1.93k stars 618 forks source link

Chunked stream responses #823

Open alfsb opened 3 years ago

alfsb commented 3 years ago

Hi, I'm trying to read a long chunked response which never ends. So it's got blocked forever, with the proxy waiting for all chunks to assembly in one response body.

There is a way to inspect the chunked parts while delivering the responses in parallel?

If not, I suggest implementing a "Stream" flag or a "StreamSize" int, somewhere in SessionEventArgs, to make it possible to disable this caching mechanism, so that partial responses got delivered instantly.

A never ending response also calls for a "NoCache" flag, too, to avoid a OutOfMemoryException in long running connections.

justcoding121 commented 3 years ago

If you don't call the response body ( for example await e.GetResponseBody()) it should relay the chunked stream instead of waiting to read it to memory.

@honfika In future, we may want to add an option to read the response body bytes by bytes by adding a new method. The problem is that in that case the user will have to write the response on his own, because proxy won't be caching the body. So user would have to call a new write to stream method. Also, once read from stream is called, user will not be allowed to call SetBodyBytes(). When write to stream is called for the first time, we need to fetch the connection to server and write the headers. Then, we write the bytes provided by user back to back. Something like below.

int readBytes = int.Max;
while(readBytes > 0)
{
 var bytesRead = await e.ReadResponseBodyStream(int bytesToRead, bool isChunked);
 readBytes = bytesRead.Length;

// User processes the bytes read on his own. (May be user writes the bytes to disc as a file)
 ... 

 await e.WriteResponseBodyStream(byte[] bytes, bool isChunked)
}

If someone could create a PR would be great.

alfsb commented 3 years ago

I think there are separate questions here.

  1. A way to inspect all bytes received on the body, chunked or otherwise. Say, a drop copy interface, receives a callback, and that callback parameters contains all headers and the body or chunk bytes.

  2. A way to change the received bytes. This could be accomplished on the same callback interface, where the bytes parameter, if changed or not, will be the bytes pushed on ResponseBody.

The problem above occurs only on chunked responses, but a clearer way to express these concepts is to have both interfaces, at same time:

a) a HttpMessageReceived callback, that contains all parsed headers and all bytes that are cached anyway.

b) a HttpChunkReceived callback, that contains only the bytes received on the chunk, with no cache by the proxy.

Both callbacks can have the same parameters, only the name denotes where this is a complete message or a simple chunk.

justcoding121 commented 3 years ago

Based on the discussion above, I just added two new event handlers as a prototype to develop branch, which if someone could implement in future would be great.

https://github.com/justcoding121/titanium-web-proxy/commit/8a32fa99116afa82fad55476eef3ce161efad355

There are two scenarios that we need to handle.

  1. Read Only: User do not modify the body bytes (chunked or otherwise). In that case, user can subscribe to the event handler shown below and peek the body bytes. The handlers will be called once for regular response with fixed content length header. It will be called multiple times for chunked response, i.e. each time a chunk is read and is about to be written. IsFinalChunk will be true when the chunk read from client or server stream is the last chunk.

  2. Read & Modify: User wants to modify the body bytes. In that case, user would set the byte[] BodyBytes { get; set; } property. If IsChunked is false, user will have to set the full body bytes only once. If IsChunked is true, user would set chunk body and mark IsFinalChunk to true when it is the final chunk. In all cases, the handlers will be called again and again until IsFinalChunk is true. We would need to ensure that any unread chunked bytes are discarded from client or server stream by siphoning them out before finalizing the request or response. Also, if body is a compressed stream, that need to be handled properly, we need to decompress for the handler call, and compress upon write chunk call.

Note thatIsChunked property is readonly, i.e user will not be allowed to modify theTransfer-encoding or any other headers at this stage, because its already send to client or server.

Similarly, in case of fixed length body with Content-Length header already sent, user is allowed to change body bytes, but its length should match exactly with Content-Length header already send. For fixed length body, user can use existing BeforeRequest or BeforeResponse event handlers to modify the headers first with different Content-Length, and then set the body bytes inside these handlers with its length matching exactly to the content-length header already sent.

image

public class BeforeBodyWriteEventArgs : ProxyEventArgsBase
    {
        internal BeforeBodyWriteEventArgs(SessionEventArgs session, byte[] bodyBytes, bool isChunked, bool isFinalChunk) : base(session.ClientConnection)
        {
            Session = session;
            BodyBytes = bodyBytes;
            IsChunked = isChunked;
            IsFinalChunk = isFinalChunk;
        }

        /// <value>
        ///     The session arguments.
        /// </value>
        public SessionEventArgs Session { get; }

        /// <summary>
        ///  Indicates whether body is written chunked stream.
        ///  If this is true, BeforeRequestBodySend or BeforeResponseBodySend will be called until IsLastChunk is true.
        /// </summary>
        public bool IsChunked { get; }

        /// <summary>
        /// Indicates if this is the last chunk from client or server stream, when request is chunked.
        /// Override this property to true if there are more bytes to write.
        /// </summary>
        public bool IsFinalChunk { get; set; }

        /// <summary>
        /// The bytes about to be written. If IsChunked is true, this will be a chunk of the bytes to be written.
        /// Override this property with custom bytes if needed, and adjust IsLastChunk accordingly.
        /// </summary>
        public byte[] BodyBytes { get; set; }
    }
justcoding121 commented 3 years ago

I think this approach is relatively simple to implement and shouldn't break existing functionality. All we need to do is pass the new handlers all the way down to the write bytes call in http stream, and do the call back logic described above.

justcoding121 commented 3 years ago

Actually, the above suggested new handlers can also be used for fixed content length body to be read bytes by bytes. In that case, the handler will be called again and again each time we fill our read buffer, until content-length is reached. User can also set the bytes and the handler will be called again and again until content-length amount of bytes is reached.

That would be helpful to keep the proxy memory footprint low, when the fixed content-length is large, say in tens of megabytes.

justcoding121 commented 3 years ago

I've also did some prep work so that both ProxyServer and SessionEventArgs would be available inside HttpStream, so that everything needed for the event handlers will be in HttpStream.

alfsb commented 3 years ago

I'm suggesting to change the protocol a little, to make these new handlers more akin to a filter than a batch processing. First I will comment inline, and then the racionalle below.

On Thu, Apr 22, 2021 at 11:16 PM Jehonathan Thomas @.***> wrote:

1.

Read & Modify: User wants to modify the body bytes. In that case, user would set the byte[] BodyBytes { get; set; } property.

In that case, the user would set or clear the byte[] BodyBytes { get; set; } property. (With null? With new byte[]? A method .ClearBytes() to make this unambiguous?)

A cleared chunked fragment is not sent down or upstream if an intermediary empty chunk message is impossible, or sent without any data if possible by protocol.

1.

If IsChunked is false, user will have to set the full body bytes only once.

If IsChunked is false, the user will have a chance to modify body bytes only once.

1.

If IsChunked is true, user would set chunk body and mark IsFinalChunk to true when it is the final chunk.

The user probably dont want to mess with IsFinalChunck, as this may cause severe disruptions. Say, an intermediary chunk marked as final followed by another chunk.

1.

In all cases, the handlers will be called again and again until IsFinalChunk is false. We would need to ensure that any unread chunked bytes are discarded from client or server stream by siphoning them out before finalizing the request or response. Also, if body is a compressed stream, that need to be handled properly, we need to decompress for the handler call, and compress upon write chunk call.

It's guaranteed that every chunk sits in a compression boundary or frame? If not, this may not be possible without a ton of bookkeeping, up to the original situation where a chunked message is eternally cached until the RAM or a timeout runs off.

In this situation, a RawBytes/DecodedBytes property may be necessary in lieu of the BodyBytes property that can be impossible to offer.

1.

Note that IsChunked property is readonly, i.e user will not be allowed to modify the Transfer-encoding header at this stage, because its already send to client or server.

By the same reasoning, IsFinalChunk shouldn't be touched if the user really really knows that he is doing, but I can foresee some situations where the user need to mark some outbound chunk stream as completed but keep collecting the inbound chunk stream.

So the protocol I suggest is this:

User wants to modify the body bytes. In that case, the user would set the byte[] BodyBytes { get; set; } property in each callback, or set it to null / call .ClearBody() to avoid these bytes to be sent. IsChunked only indicates if the original HTTP message is chunked or not, but this callback is called on every body byte block received by the proxy, chunked or not. The handlers will be called again and again until IsFinalChunk is true. Also, if the body is a compressed stream, the RawBytes will contain the original bytes, and ~BodyBytes~ DecodedBytes is filled with inflated data on a best efforts basis, but may be null in case of errors. IsChunked property is readonly, i.e user is not allowed to modify the Transfer-encoding header at this stage, because it's already sent to client or server.

André L F S Bacci

alfsb commented 3 years ago

On Thu, Apr 22, 2021 at 11:51 PM Jehonathan Thomas @.***> wrote:

Actually, the above suggested new handlers can also be used for fixed content length body to be read bytes by bytes. In that case, the handler will be called again and again each time we fill our read buffer, until content-length is reached. User can also set the bytes and the handler will be called again and again until content-length amount of bytes is reached.

That would be helpful to keep the proxy memory footprint low, when the fixed content-length is large, say in tens of megabytes.

Yes, there are some situations where this would be necessary. Say, a proxy capable of recompressing or decompressing any request that passes there, to fix some upstream or downstream incompatibility. Some big files appear in this situation and not all RAM in the world will solve this problem if it's necessary to wait for the whole file. And timeouts.

justcoding121 commented 3 years ago

To clarify some of the things you mentioned.

This new handlers will NOT be called if the body was already read through GetRequestBodyAsString() or GetRequestBody() from within existing before request handlers. In that case, user will have to call SetRequestBody or SetRequestBodyString. Same is true for response (i.e GetResponseBodyAsString() or GetResponseBody())

Sending Empty Body

1. For chunked body, if you don't want to send any response body, this new handler will be called with original chunks read (inside BodyBytes) from the client or server again and again, until all chunks are read. User will set BodyBytes to null or empty in return.

private async Task OnResponseBodyWrite(object sender, BeforeBodyWriteEventArgs e)
{
    //append original bytes to disk file 
    appendAndSaveOriginalBytes(e.BodyBytes);

    e.BodyBytes = null;
}

2. For Fixed content length, if you don't want to send any response body, you would set the Content-Length: 0 inside OnBeforeRequest or OnBeforeResponse handlers. This handler will still be called with original bytes read (inside BodyBytes) from the client or server again and again, until all original content-length bytes are read. User will return BodyBytes as null or empty always.

private async Task OnResponseBodyWrite(object sender, BeforeBodyWriteEventArgs e)
{
    //append original bytes to disk file 
    appendAndSaveOriginalBytes(e.BodyBytes);

    e.BodyBytes = null;
}

Sending non-empty Body

1. For chunked body, http compression is done for the entire stream. So when this handler is called, it will be uncompressed BodyBytes read from the underlying decompression stream of client or server. When writing the body back to server or client, it will then go through a compression stream automatically. The type of compression stream (gzip, zlib etc) is based on content-encoding header, users don't need to worry about compression or decompression process. The new handler will be called again and again until IsLastChunk is true.

2. For fixed content length, if you modify the body bytes, you need to specify the content-length of compressed modified body ahead in OnBeforeRequest or OnBeforeResponse handlers. This new handler will be called again and again at the minimum until original content-length number of BodyBytes are read. If the new content-length is larger than original, then this new handler will be called again and again with empty BodyBytes until new content-length is send. When the new content-length is smaller than the original content-length, user would set the BodyBytes to null for remaining callbacks. Calculating compressed body length ahead may be difficult. To avoid that, one can mark this request or response as uncompressed inside OnBeforeRequest or OnBeforeResponse (use content-encoding: identity), then set Content-Length to the actual uncompressed byte length and provide uncompressed BodyBytes in this new handler.

justcoding121 commented 3 years ago

Sorry for deleting my responses multiple times. 😅 I think I was thinking out loud while typing.

One more thing that we need to do apart from this new handler is for e.Ok() calls inside "before request" or "before response" calls. In that case, we would still need to give an option for users to read the body as stream before canceling the request. That is out of scope for this work.

zeltrax00 commented 1 year ago

Did anyone implement this feature ? Can I get a sample ?