rexyai / RestRserve

R web API framework for building high-performance microservices and app backends
https://restrserve.org
275 stars 32 forks source link

Multipart requests with .Net HttpClient failing #160

Closed rplati closed 3 years ago

rplati commented 4 years ago

We were able to get RestRserve working with .Net RestSharp client but multipart requests with the .Net core HttpClient are failing. A minimal example follows.

restrserve.r

library(RestRserve)

app = Application$new(content_type = "text/plain")
app$logger$set_log_level("trace")

app$add_post(
path = "/echo",
FUN = function(request, response) {
  part1 <- request$get_file("part1")
  part2 <- request$get_file("part2")
  response$set_body(part1)
}
)

backend = BackendRserve$new()
backend$start(app, http_port = 80)

This is started within a Docker container using the rexyai/restrserve:dev image.

Dockerfile

FROM rexyai/restrserve:dev
COPY / /
EXPOSE 80
ENTRYPOINT ["Rscript", "restrserve.r"]

The following C# code uses .Net core 3.1

Program.cs

using RestSharp;
using System;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;

internal class Program
{
    private static async Task Main(string[] args)
    {
        var requestUrl = "http://127.0.0.1:6027/echo";

        try
        {
            Console.WriteLine("Submitting multipart with RestSharp");
            Console.WriteLine("RestSharpResult: " + await TestRestSharp(requestUrl));
        }
        catch (Exception e)
        {
            Console.WriteLine($"Failed to process request (RestSharp): {e}");
        }

        try
        {
            Console.WriteLine("Submitting multipart with HttpClient");
            Console.WriteLine("HttpClientResult: " + await TestHttpClient(requestUrl));
        }
        catch (Exception e)
        {
            Console.WriteLine($"Failed to process request (HttpClient): {e}");
        }

        Console.ReadLine();
    }

    private static async Task<string> TestHttpClient(string requestUrl)
    {
        using var client = new HttpClient { Timeout = TimeSpan.FromSeconds(20) };
        using var content = new MultipartFormDataContent
            {
                { new StreamContent(ToMemoryStream("This is the first part.")), "part1", "part1.csv" },
                { new StreamContent(ToMemoryStream("This is the second part.")), "part2", "part2.csv" },
            };

        Console.WriteLine($"Submit request: {requestUrl}");
        var response = await client.PostAsync(requestUrl, content);
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadAsStringAsync();
    }

    private static async Task<string> TestRestSharp(string requestUrl)
    {
        var client = new RestClient(requestUrl);
        var streams = new Stream[] { ToMemoryStream("This is the first part."), ToMemoryStream("This is the second part.") };
        var clientRequest = new RestRequest(Method.POST);
        clientRequest.AddFile("part1", async writer => await streams[0].CopyToAsync(writer), "part1.csv", streams[0].Length);
        clientRequest.AddFile("part2", async writer => await streams[1].CopyToAsync(writer), "part2.csv", streams[1].Length);
        Console.WriteLine($"Submit request: {requestUrl}");
        var response = await client.ExecuteAsync(clientRequest);

        if (response.IsSuccessful == false)
            throw new InvalidOperationException($"{response.Content} {response.ErrorException}");

        return response.Content;
    }

    private static MemoryStream ToMemoryStream(string text) => new MemoryStream(Encoding.UTF8.GetBytes(text));
}

Requests submitted with the RestSharp client work fine and produce expected RestRserve log entries. Requests submitted with the HttpClient fail with the below error message and produce no RestRserve log entries.

Console output

Submitting multipart with RestSharp
Submit request: http://127.0.0.1:6027/echo
RestSharpResult: This is the first part.
Submitting multipart with HttpClient
Submit request: http://127.0.0.1:6027/echo
Failed to process request (HttpClient): System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Evaluation error).
   at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
   at Program.TestHttpClient(String requestUrl) in D:\temp\HttpClientMultipartBug\HttpClientMultipartBug\Program.cs:line 48
   at Program.Main(String[] args) in D:\temp\HttpClientMultipartBug\HttpClientMultipartBug\Program.cs:line 27

Something about the way the .Net core HttpClient forms the multipart request causes problems for RestRserve. The HttpClient request should be ok as it works with the corresponding API implemented with plumber.

We would prefer to use the .Net core HttpClient over RestSharp because RestSharp uses unreasonable amounts of memory when the multipart body is large.

dselivanov commented 4 years ago

We will be happy to fix if you can find the root cause of the problem. May by adding more verbose debugging into RestRserve app will help. Or if that doesn't work try to build Rserve with RSERV_DEBUG defined ( adding PKG_CFLAGS += -DRSERV_DEBUG).

rplati commented 4 years ago

I built Rserve with PKG_CFLAGS += -DRSERV_DEBUG like you suggested. This produces the following log for the failed request with HttpClient.

process request for 0x000001c99668b1e0
 [attr = 2]
 [attr = 42]
 [attr = 4a]
process request for 0x000001c994b8b740
eval(try(.http.request('/echo'),silent=TRUE))
respond with 500 and content: Error in cpp_parse_multipart_body(body, boundary) : 
  Boundary string not found.

Rserve for some reason can’t use the boundary string used by HttpClient. I'll investigate this more next week and try to find examples of HttpClient’s raw requests that Rserve fails to process.

artemklevtsov commented 4 years ago

@rplati simply store the post body as rds from the app and try to pass it to RestRserve:::cpp_parse_multipart_body (with boundary string). Also you can post the body here.

rplati commented 4 years ago

I saved the requests with fiddler then tried parsing them with RestRserve:::cpp_parse_multipart_body using the corresponding boundary from the header. It seems there are several differences between requests formed by RestSharp and httpClient. httpClient submits requests like the following

POST http://127.0.0.1:6027/echo HTTP/1.1
Host: 127.0.0.1:6027
Content-Type: multipart/form-data; boundary="182e92b9-ebbb-4522-9205-a05c9e35191f"
Content-Length: 361

--182e92b9-ebbb-4522-9205-a05c9e35191f
Content-Disposition: form-data; name=part1; filename=part1.csv; filename*=utf-8''part1.csv

This is the first part.
--182e92b9-ebbb-4522-9205-a05c9e35191f
Content-Disposition: form-data; name=part2; filename=part2.csv; filename*=utf-8''part2.csv

This is the second part.
--182e92b9-ebbb-4522-9205-a05c9e35191f--

Characteristics of requests formed by httpClient:

  1. the boundary string is enclosed in double quotes in the header;
  2. the name and filename are not enclosed in double quotes in the Content-Disposition rows;
  3. a utf-8 variant of the filename is added to the Content-Disposition rows;
  4. no Content-Type row is included (between the Content-Disposition row and the content).

If I understand correctly, the error Boundary string not found is caused by characteristic 1 - RestRserve:::cpp_parse_multipart_body is supplied with the boundary in quotes of which it finds no instances in the body. According to my testing, characteristics 2 through 4, while not resulting in an error, will result in RestRserve:::cpp_parse_multipart_body returning an empty list.

artemklevtsov commented 4 years ago

It would be helpful if you post examples of the requests here. So we can debug a C++ code with this inputs.

artemklevtsov commented 4 years ago

Can you test on the boundary-parser-fix branch?

remotes::install_github("rexyai/RestRserve@boundary-parser-fix")
artemklevtsov commented 4 years ago

Seems related SO question.

dselivanov commented 4 years ago

According to rfc7578 it seems reasonable to assume that boundary might come in quotes.

rplati commented 4 years ago

Can you test on the boundary-parser-fix branch?

remotes::install_github("rexyai/RestRserve@boundary-parser-fix")

When testing the package from the fix branch, I get the error below for all requests. Googling the error message suggests I need to reinstall something. Do you know what package the object closeFD might relate to?

eval(try(.http.request('/echo'),silent=TRUE))
respond with 500 and content: Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
  object 'closeFD' not found
dselivanov commented 4 years ago

closeFD likely comes from here https://github.com/rexyai/RestRserve/blob/f8e8f7def89a8e1a264735725a5184da7fc8d851/R/BackendRserve.R#L98. But I'm not sure why you are getting this error.

artemklevtsov commented 4 years ago

Windows issue may be?

dselivanov commented 4 years ago

Windows issue may be?

Seems so! and likely CI also fails because of the same issue. Need to call closeFD conditionally on system type (call only on UNIX).

rplati commented 4 years ago

Indeed, this seems to be a Windows issue. The install from the fix branch works in a linux docker container.

The issue with the boundary string appearing in quotes seems to be resolved!

httpClient’s multipart body is still not being parsed correctly, however, because of the other peculiarities of how httpClient forms the requests. This is the example of the request body:

--182e92b9-ebbb-4522-9205-a05c9e35191f
Content-Disposition: form-data; name=part1; filename=part1.csv; filename*=utf-8''part1.csv

This is the first part.
--182e92b9-ebbb-4522-9205-a05c9e35191f
Content-Disposition: form-data; name=part2; filename=part2.csv; filename*=utf-8''part2.csv

This is the second part.
--182e92b9-ebbb-4522-9205-a05c9e35191f--

RestRserve:::cpp_parse_multipart_body returns an empty list when trying to parse the above body. The following things seem to be causing the problem:

When testing, I had to change all of these to get cpp_parse_multipart_body to correctly parse the body: I added double quotes to the name and filename, removed the utf-8 variant of the filename, and added a Content-Type row. In all other cases when I left one of these peculiarities uncorrected, cpp_parse_multipart_body returned an empty list. It seems, therefore, that cpp_parse_multipart_body would need to take into account the possibility of all of these peculiarities for httpClient’s requests to parse correctly.

The omission of the Content-Type is the most problematic of these, IMO. Would you say it would be reasonable to assume the file's Content-Type is application/octet-stream when the client does not provide one explicitly?

dselivanov commented 4 years ago

@rplati please checkout latest code from #162 - I believe I've fixed everything we've discussed here

rplati commented 4 years ago

Many thanks to both of you! httpClient’s requests are now parsed correctly – tested on Windows and Linux container.

rplati commented 4 years ago

Could you say approx. when this will be available on cran?

dselivanov commented 3 years ago

Not sure. We need at least to add some basic tests to the PR. I'm quite busy these days, but can review PR if someone wants to do that.

Other than that there are no reasons to not push new version to cran.