Open vnayar opened 5 months ago
One thing that is sub optimal about the way of defining the whole multipart entity up-front is that it inevitably requires a number of dynamic memory allocations. In high-throughput scenarios, this has repeatedly turned out to be a serious bottleneck or resulting in pseudo memory leaks with the GC implementation.
A way to mitigate this would be to change the entity/part types from class
to template structs with appropriate factory functions:
auto multiPartEntity(HEADER_MAP, PARTS)(HEADER_MAP headers, PARTS parts) {
return MultipartEntity!(HEADER_MAP, PARTS)(headers, parts);
}
auto multiPartEntity(PARTS)(string content_type, PARTS parts) {
auto headers = only(tuple("Content-Type", content_type));
return MultipartEntity!(typeof(headers), PARTS)(headers, parts);
}
auto formPart(string name, string value) {
auto headers = only(...);
return MultipartEntityPart!(typeof(headers), string)(headers, value);
}
auto formPart(string name, NativePath file_path) {
// ...
}
struct MultipartEntity(HEADER_MAP, PARTS) {
HEADER_MAP headers;
PARTS parts;
string boundary;
}
struct MultipartEntityPart(HEADER_MAP, VALUE) {
HEADER_MAP headers;
VALUE value;
}
One thing that is sub optimal about the way of defining the whole multipart entity up-front is that it inevitably requires a number of dynamic memory allocations. In high-throughput scenarios, this has repeatedly turned out to be a serious bottleneck or resulting in pseudo memory leaks with the GC implementation.
I wasn't fully aware of that. Originally I was hoping the use of classes would help make it easier to pass the object by reference and avoid copies, but wasn't aware that dynamic allocations are so much more expensive. What is it about the GC that makes them so costly?
Also, why use templates for the attribute types?
struct MultipartEntityPart(HEADER_MAP, VALUE) { HEADER_MAP headers; VALUE value; }
Don't concrete types make it easier to utilize the values when writing them to a connection, such as in vibe-http, client.d? https://github.com/vnayar/vibe-http/blob/multipart-form-data-entities-rfc-2388/source/vibe/http/client.d#L909
Maybe I'm confused about the actual status of types like InetHeaders
and InputStream
. These are the types utilized within vibe, right?
The MultipartEntityPart
body type of SumType!(string, InterfaceProxy!InputStream)
was selected to permit a body to be chosen either by giving a fixed value (a common case with <input>
and <select>
tags in HTML forms, but also to chose an available file, or to choose data that a service already has in memory (e.g. a server calling an API endpoint, and submitting data that it got from elsewhere, such as by extracting an email or generating a file in memory using a library.)
One thing that is sub optimal about the way of defining the whole multipart entity up-front is that it inevitably requires a number of dynamic memory allocations. In high-throughput scenarios, this has repeatedly turned out to be a serious bottleneck or resulting in pseudo memory leaks with the GC implementation.
I wasn't fully aware of that. Originally I was hoping the use of classes would help make it easier to pass the object by reference and avoid copies, but wasn't aware that dynamic allocations are so much more expensive. What is it about the GC that makes them so costly?
There are two escalation stages with the current GC. The first one happens because the collection stage uses a stop-the-world approach and allocations require taking a global GC lock. In situations with high GC pressure or usage frequency, this can nullify any multi-threading performance gains.
The second stage happens when for some reason (probably memory pool fragmentation, but maybe also due to the way the GC selects a pool for a new allocation, I haven't looked close enough into the implementation) the GC cannot consistently reuse memory for new allocations. This then leads to infinite grown of allocated pools, which is a de-facto memory leak. I've observed this being triggered both in vibe.d web application contexts and in our photo application as soon as rapid allocations for more than a few bytes happen from multiple threads at once. This happens rather rarely, but when it does it's hard to track down.
Also, why use templates for the attribute types?
The idea is that it allows the user to avoid any heap/GC allocations whenever possible. So for example a static array or an only()
range can be passed instead of a dynamic array. This can also be used on combinations, such as allocating default headers once in a dynamic array and then doing .join(only("Some-Header", value))
to dynamically append a header without having to allocate and without passing a lot of by-value memory.
Don't concrete types make it easier to utilize the values when writing them to a connection, such as in vibe-http, client.d? https://github.com/vnayar/vibe-http/blob/multipart-form-data-entities-rfc-2388/source/vibe/http/client.d#L909
Maybe I'm confused about the actual status of types like
InetHeaders
andInputStream
. These are the types utilized within vibe, right?The
MultipartEntityPart
body type ofSumType!(string, InterfaceProxy!InputStream)
was selected to permit a body to be chosen either by giving a fixed value (a common case with<input>
and<select>
tags in HTML forms, but also to chose an available file, or to choose data that a service already has in memory (e.g. a server calling an API endpoint, and submitting data that it got from elsewhere, such as by extracting an email or generating a file in memory using a library.)
I forgot to add the proper template constraints in my example. For HEADER_MAP
this would be something like isStringMap!HEADER_MAP
and for PARTS
it could be isInputRange!PARTS && isMultipartBodyType(ElementType!PARTS)
. isMultipartBodyType
could then accept string
, const(ubyte)[]
, an isInputStream!T
type, or a SumType!(...)
of such types (possibly also Algebraic
or TaggedUnion
, if there is a common interface). Since both have an input range interface, the code to handle them shouldn't be much different from the code with concrete types.
Without an example this does become a bit more complicated to understand from the user's point of view, unfortunately, but having the high-level overloads for the common cases should mitigate that hopefully.
I forgot to add the proper template constraints in my example. For
HEADER_MAP
this would be something likeisStringMap!HEADER_MAP
and forPARTS
it could beisInputRange!PARTS && isMultipartBodyType(ElementType!PARTS)
.isMultipartBodyType
could then acceptstring
,const(ubyte)[]
, anisInputStream!T
type, or aSumType!(...)
of such types (possibly alsoAlgebraic
orTaggedUnion
, if there is a common interface). Since both have an input range interface, the code to handle them shouldn't be much different from the code with concrete types.
In this particular case, I believe we won't be able to treat the parts as a range interface, because they lack a common element type. For example, a form might consist of 3 fields, a text input, a checkbox, and a multi-file upload. Thus a statement like isMultipartBodyType(ElementType!PARTS)
wouldn't work, because there is no single element type for PARTS
. I don't know if it's possible to create at compile-time a new SumType
from the original part types. Is that what is needed? Perhaps as a stop-gap, we can simply make the returned type a SumType
of all the types that can be returned by a helper function like auto formPart(string name, NativePath file_path)
or auto formPart(string name, string value)
.
According to the spec, the default Content-Type
of a multipart/form-data field is text/plain
. So checkboxes, options, as well as text fields should all use text/plain
(although technically they don't have to) https://datatracker.ietf.org/doc/html/rfc2388#section-3 File upload seems to be the only mentioned case where the Content-Type
is explicitly set, and it can be something like application/pdf
or multipart/mixed
and represent a set of files.
It's a bit of a shame that there's no MIME RFC2046 library that already exists that could be used for this purpose. (Hmm, perhaps I should begin there?)
Perhaps as a stop-gap, we can simply make the returned type a SumType of all the types that can be returned by a helper function like auto formPart(string name, NativePath file_path) or auto formPart(string name, string value)
Yeah, I'd say it makes sense to go from there. For the internal representation we should have that anyway, as Variant
should really be a last-resort solution due to its less than ideal overall characteristics (GC allocations, virtual function calls, no nothrow
/@safe
).
I'd say that with changing to SumType
and making the tests pass we can merge this and make further API refinements as separate steps.
Perhaps as a stop-gap, we can simply make the returned type a SumType of all the types that can be returned by a helper function like auto formPart(string name, NativePath file_path) or auto formPart(string name, string value)
Yeah, I'd say it makes sense to go from there. For the internal representation we should have that anyway, as
Variant
should really be a last-resort solution due to its less than ideal overall characteristics (GC allocations, virtual function calls, nonothrow
/@safe
).I'd say that with changing to
SumType
and making the tests pass we can merge this and make further API refinements as separate steps.
I ran into a problem with this approach. Because the Header type is templated, and the stream type that a file can be in is also templated (inside FormFile(InputStreamT)
), there's no way to make a single SumType
, because to do so would require knowing the HeaderT
and InputStreamT
values in advance.
The only way I can think of to make this work is to provide the HeaderT
and InputStreamT
types as template-parameters to MultipartEntity
, and this would force all the different parts of the multipart form to use the same header/stream types. Only Variant
permits mixing and matching of header/stream types.
However, because the HeaderT
and InputStreamT
is sometimes determined by factory methods, and mixing would no longer be allowed, it would essentially mean only a single type is permitted, and there's no point offering it as a template-parameter. At this point, one might as well hard-code InetHeaderMap
for HeaderT
and InterfaceProxy!InputStream
for InputStreamT
.
How important is it to offer arbitrary range types in the interface? If it's very important, then maybe Variant
should be kept. If it's not important, then the header and file stream types should probably be locked in and the body could then be SumType
.
@s-ludwig So what do you think about the problem of using SumType?
In a nutshell, if I have a list/range of Multiparts (which can be the body of a multipart entity), then I need a type for that variable. However, if I provide a type, then it has to have a concrete header and body type. But this means it cannot support ranges, which themselves have no concrete type. E.g. you can't have a concrete range of different types of ranges. You would either have to lock-down supported range types, like InterfaceProxy!InputStream, or use Variant. Which is better?
Sorry for replying late, I actually started to write a reply two weeks ago but then got sidetracked. But now actually, after re-reading everything, I think that what I meant in my first comment was actually to have the PARTS
template parameter to be variadic and just forgot about it when you brought up the range type issue and didn't see it in the code.
So it would look like this: auto multiPartEntity(HEADER_MAP, PARTS...)(HEADER_MAP headers, PARTS parts)
Parts can then be given as multiple individually typed arguments. Each argument could be a range of parts, or a single part. The user could choose the range element type to be whatever is needed. Although it may not actually be necessary at all, we could also support using SumType
as range elements and match accordingly, so that the user has maximum flexibility here to represent the parts list.
Sorry for replying late, I actually started to write a reply two weeks ago but then got sidetracked. But now actually, after re-reading everything, I think that what I meant in my first comment was actually to have the
PARTS
template parameter to be variadic and just forgot about it when you brought up the range type issue and didn't see it in the code.So it would look like this:
auto multiPartEntity(HEADER_MAP, PARTS...)(HEADER_MAP headers, PARTS parts)
Parts can then be given as multiple individually typed arguments. Each argument could be a range of parts, or a single part. The user could choose the range element type to be whatever is needed. Although it may not actually be necessary at all, we could also support using
SumType
as range elements and match accordingly, so that the user has maximum flexibility here to represent the parts list.
I originally had tried something like this, and the code is still commented out, but it would look something like this:
/// Returns a MultipartEntity with Content-Type "multipart/form-data" from parts as a compile-time sequence.
auto multipartFormData(MultipartR...)(MultipartR parts) {
string boundaryStr = createBoundaryString();
auto headers = only(
tuple("Content-Type", "multipart/form-data; boundary=" ~ boundaryStr));
return MultipartEntity!(typeof(headers))(headers: headers, bodyValue: parts, boundary: boundaryStr);
}
In this case, the compiler expands the variadic parameter parts
, and you end up with an error like this:
source/vibe/inet/webform.d(794,42): Error: template `vibe.inet.webform.MultipartEntity!(OnlyResult!(Tuple!(string, string))).MultipartEntity.__ctor` is not callable using argument types `!()(OnlyResult!(Tuple!(string, string)), MultipartEntity!(OnlyResult!(Tuple!(string, string))), MultipartEntity!(OnlyResult!(Tuple!(string, string), Tuple!(string, string), Tuple!(string, string))), MultipartEntity!(OnlyResult!(Tuple!(string, string), Tuple!(string, string))), string)`
Shortening the header types as H and body type as B, the error is:
source/vibe/inet/webform.d(794,42): Error: template `vibe.inet.webform.MultipartEntity!(H).MultipartEntity.__ctor` is not callable using argument types `!()(H, B1, B2, B3, string)`
In essence, the same problem remains. A concrete type that can fit all the diverse body types is still needed. And those body types might wrap a file or other arbitrary range. Thus, the body variable, if it does not wish to use Variant, would still need a SumType that encompasses all the possible body content types that are permitted: string, ranges, inputstreams, etc.
Even Variant doesn't solve the problem, because when sending the actual form on an HTTP connection, you need a way to write the contents of the multipart parts, and without a type, that's not really an option. The only ideas I have right now would be to support InputProxy!InputStream
and InputRangeObject
, these are at least concrete types one can compile against.
I added a new commit that hopefully makes things easier to reason about this. I got rid of Variant
and replaced it with SumType
, and the code compiles with the minimal tests running as well. To do this, I simply got rid of all the template parameters.
Starting from something that's working, maybe it would be easier to specify, one at a time, what template parameters should exist.
return MultipartEntity!(typeof(headers))(headers: headers, bodyValue: parts, boundary: boundaryStr);
This should be something like this:
return MultipartEntity!(typeof(headers), MultipartR)(headers: headers, bodyValue: parts, boundary: boundaryStr);
The named argument initialization probably cannot be used, yet, due to backwards compatibility and I'm not sure how well it plays with tuple typed fields, but in any case, the MultipartR
types need to be explicitly passed to the MultipartEntity
template type.
edit: The MultiPartEntity
declaration would look like this:
struct MultipartEntity(HEADER_MAP, PARTS...) {
HEADER_MAP headers;
PARTS parts;
string boundary;
}
edit: The
MultiPartEntity
declaration would look like this:struct MultipartEntity(HEADER_MAP, PARTS...) { HEADER_MAP headers; PARTS parts; string boundary; }
Wouldn't that make it impossible to declare an object in practice? Imagine trying to a assign to parts
. I could try something like parts = [ MultiPartEntity!(InetHeaderMap, string), MultiPartEntity!(InetHeaderMap, File) ]
, but these two items contain no common type, there could be no type implied for parts
or even the literal value being assigned to it.
I don't think this level of desired flexibility can be accomplished without Variant. One either needs to sharply limit the types that a MultiPartEntity body can have, or use Variant. Variant is also only a half-solution, because one still needs to know how to write the contents of the body to a Stream. So it does matter whether it's a File, Stream, etc.
The nice thing is that the user can prepare the parts however it fits best:
// functional style:
auto entity = multiPartEntity(header,
formPart("user", "peter"),
formPart("file", NativePath("foo.dat"));
auto entity = multiPartEntity(header,
formPart("user", "peter"),
paths.map!(p => formPart("file", p)));
// imperative style:
typeof(formPart("", NativePart.init))[] files;
foreach (path; paths)
files ~= formPart("file", path);
auto entity = multiPartEntity(header,
formPart("user", "peter"),
files);
alias MyPartTypes = SumType!(typeof(formPart("", "")), typeof(formPart("", NativePath.init));
MyPartTypes[] parts;
parts ~= MyPartTypes(formPart("foo", "bar"));
parts ~= MyPartTypes(formPart("foo", NativePath("bar.dat")));
auto entity = multiPartEntity(header, parts);
The necessity for typeof
when going the imperative route isn't very nice, but we could pre-define some type aliases for the two simplest cases, if that style of initialization is deemed important enough. However, I think that just using separate parameters for different types will work for almost all practical use cases, so that the necessity for SumType
(or Variant
) should be quite a niche case in practice.
This is an initial attempt to implement a standards-compliant multipart/form-data request.
See https://github.com/vibe-d/vibe-http/issues/32
The initial commit here is preliminary and will require iteration, due to a lack of familiarity on the author's part of Vibe's internal conventions and patterns.